ID #1031

How to train YAM's spam filter properly?

YAM's spam filter is an adaptive filter - a so-called Bayesian spam-filter - which was ported from the well-known mail client Thunderbird and behaves exactly the same.

Background:
The nature of this kind of spam filtering is an iterative learning process where the content of a mail will be evaluated according to a calculated spam probability. That means, that right from the beginning it is absolutely "stupid" and doesn't know which mails are actually spam or no spam. Hence the filter needs to be trained manually to be able to distinguish between spam and no spam.

Usage:
After enabling the spam filter all newly received mails will most likely be classified as spam and automatically moved to the spam folder. This is absolutely normal and intended! Again, keep in mind that the spam engine is "stupid" and only relies on your user input.

Now you have to tell YAM which mails are really spam and which are not spam (so-called Ham). Just mark all regular (no spam) mails which were moved to the spam folder as "Not Spam" and move them to another folder (i.e. the incoming folder). This will train the spam engine about how "no spam" mails look like. Such mails are called false negatives because they were automatically flagged as spam but they aren't spam. Continue to do so as more "no spam" mail arrives which will be automatically flagged as spam on accident.

Of course, the spam procedure applies to spam mail that was not recognized as spam. If new mail arrives and it wasn't properly recognized as spam flag it as spam manually. Continue to do so as new mail arrives. After some time YAM will gradually become better in correctly recognizing spam mails itself and you will have to classify false positive and not recognized spam mails yourself very seldom - However, there will always remain a small probability of false positives and not recognized spam mail, no matter what.

WARNING:
You should NOT "enforce" the learning process by classifying all your previously collected spam mails manually after enabling the spam filter. YAM will learn best by itself with a little help by the user, as described above. Only newly received emails should be manually flagged as spam or no spam. Usually about 100 mails (good as well as bad) are enough to let YAM recognize spam mails correctly with a probability of 90% and more. However, if for some reason the spam filter "stops" to work for you no matter how to train it, go to the YAM configuration and reset the SPAM training data and restart flagging spam and no spam manually as new mail arrives.

Tags: -

Related entries:

You cannot comment on this entry