Saturday, August 15, 2015

Unlearning Ham to Learn Ham

The title of this post sounds self-contradictory, doesn't it? Why would spamassassin need to unlearn something in order to learn it? I'm not sure why. However this seems to be the case.

Today I was writing some rules for ham emails that I commonly receive. Of course, I was giving these rules negative points since spamassassin will ideally test negative for ham and positive for spam.

Doggedly I wrote one rule after another. I was desperately trying to trigger autolearn. I never did succeed. This is in spite of the fact that I wrote 7 or more rules worth minus one point each. Finally, I got disgusted and gave up on auto-learn.

I'm still a bit mystified why this did not work. I had at least 2 ham rules for the header and 5 ham rules for the body. Why would an email that tested so overwhelmingly negative for spam not trigger ham autolearning? I never did figure out why.

In any case, I finally gave up and tried something else:

sa-learn --ham email.mbox

This did not work either. So perplexing!

As my last act of desperation, I tried this sequence:

sa-learn --forget email.mbox
sa-learn --ham email.mbox

Voila! It worked! I finally got bayes to learn my email as ham!

I'm still not sure why the email needed to be entirely forgotten in order to be learned as ham. I'm not a apamassassin professional. I'm simply a guy who does his own email filtering on an amateur basis. Perhaps someday I'll learn why.

Ed Abbott