Thursday, October 6, 2011

How to Filter Out
Foreign Language Email

I just learned something new. Lately
I've been receiving Russian spam of some
kind. Since I do not know Russian, I do
not know what the spam actually says.

This web page describes how foreign
language spam can be filtered out:

Mail::SpamAssassin::Conf -
SpamAssassin configuration file


I placed the following line in my
user_prefs file:

ok_locales en

Once I finished altering user_prefs,
I tested the result using the technique
described in this post:


How to
Test One Single Email
With Spamassassin


According to the above test, the
Russian spam triggered the following
rules:

CHARSET_FARAWAY_HEADER
MIME_CHARSET_FARAWAY

Such a simple thing! Setting
ok_locales triggered a
couple of rules in this case. Both
rules use the word faraway.

I like the word faraway. I
get both chinese and russian spam.
This spam is very faraway from my
desires.

The lesson for me in all of this
is that if you want to solve a
problem, look for a simple solution
first.

Filtering out languages that I don't
understand is a simple solution.

Update: November 9, 2011

For some reason, this does not always
work. It works for some foreign
language spam but not all foreign
language spam.

Why are they still getting through?
I'm not sure.

I suspect it has to do with utf-8.
Since utf-8 is a universal character
set, it may not be as easy to itdentify
utf-8 spam as other spam.

That's my theory as to why ok_languages
does not always seem to work.

Right now, it's just a theory. However,
in coming weeks I'll be looking to see if is
is consistently true that utf-8 foreign
language spam
does not get filtered out.

Ed Abbott