Spamassassin Essentials: 2012

Saturday, July 7, 2012

Registering a Razor Identity

I just learned from this post
that I need to register a Razor
identity if I wish to report spam
to Razor:

Registering a Razor Identity

Here's an article on reporting spam
generally:

Reporting Spam

Here's the generic command for reporting
spam:

spamassassin -r < message.txt

Note that the above generic command
automatically removes all spamassassin
markup. In other words, the -d option
is automatically on.

Ed Abbott

Wednesday, June 6, 2012

Spamassassin and Bind

I've been using Spamassassin in
local mode only. Here's how I
invoke Spamassassin:

spamassassin -L

It's the -L (dash-capital-L) that
keeps spamassassin local. If you leave
off the -L, spamassassin checks for
spammy IP addresses in incoming emails.

Here's a wonderful article that describes
this much better than I do:

Caching Nameserver

I'm going to try to set up bind9
just like the webpage above suggests. I
feel that spamassassin will be much more
effective if it checks for spammy IP addresses.

Here's how I will start the process of
installing bind9:

aptitude update
aptitude search bind9

The first command above updates the
availability information for Debian
packages. The second command tells
me just what in relation to bind9 is
available for installation.

When I do the search for bind9,
I'm given the following information:

p   bind9                       - Internet Domain Name Server          
p   bind9-doc                   - Documentation for BIND               
i   bind9-host                  - Version of 'host' bundled with BIND 9
p   bind9utils                  - Utilities for BIND                   
p   gforge-dns-bind9            - collaborative development tool - DNS 
i   libbind9-60                 - BIND9 Shared Library used by BIND

Looks like what I need is on the top line, which
is bind9 itself. I'll go ahead and install the
bind9 package:

aptitude install bind9

I probably should have typed
this command first:

aptitude show bind9

The reason I did not type it is
I was quite confident that
bind9 was not installed. When I
type it now, it shows that indeed,
I have successfully installed bind9:

aptitude show bind9
Package: bind9                         
State: installed

The next step for me is to get
into kmail and turn off
the local switch for spamassassin.
Here's the menus I use to do
this:

Settings > Configure Filters

Next, I high-lighted the following:

SpamAssassin Check

Next, I look for the words
Pipe Through. I then
change the invocation from
spamassassin -L to just
spamassassin without the
-L. Next I click apply
and OK.

Update: June 13, 2012

Changing from local rules only to
rules that make use of IP addresses
and blacklists got me into trouble.
I posted to the Spamassassin mailing
list describing this trouble:

False Positive on Domain Name

The folks on the Spamassassin mailing
list were incredibly generous. I got over
50 replies to my post.

I learned that my ISP's DNS servers were
not going to work. My ISP's DNS servers
were giving me a false positive on what
should have been ham emails.

In effect, my DNS servers were causing
Spamassassin to view all unknown domain
names as having been blacklisted. That
was the overall effect.

Therefore if a domain name was mentioned
in an incoming email, and that domain name
had never been either blacklisted or
white-listed, it was assumed to be blacklisted.

To get around the unknown domain name problem,
I took two steps:

1 -- Set up a bind9 server (for DNS)

2 -- Change my /etc/resolv.conf file to
"local host."

That's it in a nutshell. More detail is
given in the above link.

Update: June 22, 2012

The above link in a mailing list discussion
that taught me how to set up my own Bind
server. The last thing I learned was how
to set my /etc/resolve.conf to localhost.

Ultimately, the way to configure /etc/resolv.conf
is to configure /etc/dhcp/dhclient.conf. The
latter controls the former.

Here's the line I added to dhclient.conf:

supersede domain-name-servers 127.0.0.1;

In this one line, I'm superseding the domain
name servers provided by dhcp. To me,
the logical place to add the above line is
after dhcp information has been retrieved.

Therefore, I've chosen to place the above line
in this context:

request subnet-mask, broadcast-address, time-offset, routers,
 domain-name, domain-name-servers, domain-search, host-name,
 netbios-name-servers, netbios-scope, interface-mtu,
 rfc3442-classless-static-routes, ntp-servers;
supersede domain-name-servers 127.0.0.1;

Whether or not these two operations --- a request followed
by a supersede --- are truly sequential, I do not know.

However, I've decided to treat these two operations as if they
were programming language operations on a serial computer.
I'm treating it as if the supersede only has lasting effect
if it comes last.

In all probability, the order in which they two operations appear
does not matter. However, I've not experimented with reversing
the order of the operations. I'm happy to leave things just as
they are.

Update: June 28, 2012

Here's a summary of what I eventually
did to get bind working as my caching
nameserver:

aptitude install bind9
Add a supersede command to
the file /etc/dhcp/dhclient.conf

The supersede command is there to set the
nameserver to local host. Here are the
sepcifics of the supersede command that I
added.

First I changed this line:

request subnet-mask, broadcast-address, time-offset, routers,
 domain-name, domain-name-servers, domain-search, host-name,
 netbios-name-servers, netbios-scope, interface-mtu,
 rfc3442-classless-static-routes, ntp-servers;

to this line:

request subnet-mask, broadcast-address, time-offset, routers,
 domain-name, domain-search, host-name,
 netbios-name-servers, netbios-scope, interface-mtu,
 rfc3442-classless-static-routes, ntp-servers;

Note that the above dhcp request has had the domain-name-servers
part of the request removed. Next, I added my supersede command
Here's what the supersede command looks like in context:

request subnet-mask, broadcast-address, time-offset, routers,
 domain-name, domain-search, host-name,
 netbios-name-servers, netbios-scope, interface-mtu,
 rfc3442-classless-static-routes, ntp-servers;
supersede domain-name-servers 127.0.0.1;

In other words, the domain names servers are no longer a product
of the dhcp request. Domain names servers are now set by the
supersede command.

Ed Abbott

Wednesday, January 11, 2012

How to Write Spamassassin Rules With No Score

I had the mistaken notion that
Spamassassin rules have to have
a score. Because of this, I was
writing rules with 1/1000th of a
point.

I gave very small scores to rules
that I would later collect together
to form a meta rule.

The mistaken notion came about because
I read that rules that have a score
of zero are not evaluated. Since Perl
uses zero to mean false and since
spamassassin is based on Perl, I figured
I was stuck writing rules with miniscule
scores if I wanted Perl to distinguish
between true and false.

A rule that evaluates to zero is zero
regardless of whether the rule is evaluated
or not. Therefore, rules that have scores
of zero are not evaluated at all by spamassassin
because evaluating them has no meaning.

What I needed, but did not realize I needed,
was a way to evaluate rules without giving
the rule a score.

Here's the article that taught me that
you can make a rule in spamassassin
that is evaluated but that has no score:

Writing your own Add-On Rules for SpamAssassin

No score rules are great! They are great
for several reasons:

No score rules do not show up in your
spamassassin scoring reports that are
inserted into each email evaluated by
Spamassassin. This reduces visual clutter.
No score rules only have consequences
if they add up to something greater in
a meta rule.
No score rules allow you to score cumulative
words and phrases, regardless of what order
the words and phrases appear in the email

Let me give an example. Let's say, in
your own mind, the term on sale
is spammy, but not so spammy as to trigger
a spam rule that scores points.

Let's also say that the word discount,
in your own mind, is also spammy, but not so
spammy as to warrant a spam trigger via point
assignment.

You now have 2 terms on sale, and
discount, which by themselves are
not spammy enough to do anything about.

After all, a good friend could email you and
say that they got something on sale or
they got it at a discount and it's all
perfectly innocent. Both terms, on sale
and discount are legitimate terms in
normal human discourse.

Now let's say that even though discount
and on sale in insolation are not worth
assigning a score to — but — taken together,
they have a much greater meaning than they do
when seen in isolation.

Let's say that, in your mind, any email that mentions
both discount and on sale in the same
email should be scored one point. Here's how you
do this:

First, you need a scoreless way to score the
term on sale. Here's the no score
way to do it:

body           __ON_SALE  m|on.{0,12}sale|i
describe       __ON_SALE  The term 'on sale' is found in the body of the email message

Note the double underscores in the name of the rule.
That's the mechanism that gives you a scoreless rule.
That's the thing I was missing. I did not understand
that spamassassin has a mechanism for assigning no
score.

Note also that I've decided to use the match operator
in a very liberal way. The zero in the match operator
indicates that I don't care whether or not the words
are run together. onsale and on sale
will both trigger this rule equally well. That's
what the zero is all about.

Also, I don't care too much about what appears between
the two words on sale as long as it is 12 characters
or less. That's very liberal and will catch the
words on sale in many different forms.

For example, it will catch this weekend only sale,
because the word only has the word on embedded
inside it. I"m choosing to be very liberal to demonstrate
that spamassassin is a very flexible tool. You may choose
to be more cautious than I"m being in my example.

Also note that I'm using the case insensitive suffix, the
letter i. With the letter i suffix, the
terms On Sale, ON SALE, and on sale
are all caught equally well.

Now lets do the same thing to the term discount.
Here's my rule for discount:

body           __DISCOUNT  m|discount|i
describe       __DISCOUNT  The word 'discount' is found in the body of the email message

OK. Now we're ready to put it all together. Now
we're ready to say that anyone who mentions a
discount and something being on sale
in the same email is at least a little bit likely
to be a spammer.

Here's how it all comes together:

meta           ON_SALE_DISCOUNT      (__ON_SALE && __DISCOUNT)
describe       ON_SALE_DISCOUNT      The terms 'on sale' and 'discount' are both found in the body of the email

Here's the thing I love most about this approach. The
rule ON_SALE_DISCOUNT has the following
characteristics:

It does not care whether on sale
or discount appears first. Any order
for these 2 terms is acceptable and earns
the spammer a point
Distance does not matter. These 2 terms
could be 5 paragraphs apart. Even with huge
swaths of text separating the 2 terms, the
2 terms together earn our spammer a point.

One more thing worth mentioning. I could have
assigned the rule ON_SALE_DISCOUNT a
score other than 1 point. However, I'm very
very happy with 1 point.

I pretty much never mess with the default 1
point that spamassassin gives you. Instead, I
write more rules if a one rule is not enough
or I eliminate a rule if that rule is not worth
1 point all by itself.

The lesson for me? There always a better way
to do things. For me, placing a double underscore
in front of rules that are only there for cumulative
effect is a much better way of doing things.

Ed Abbott

Spamassassin Essentials