Software Design UK


Home Page
Contact Us
About Us

Web 2.0
Sound Cards for Linux
Spam Filters
Linux FlashReader
Scripting in Linux
A SQL Tutorial

Clients
Client Login
Autoresponder
 
Spam Filters
Home Page > Training > Tutorials > Spam Filters


Spam Filters are software that screen out spam from Emails. There are several filters and several ways to run them. This article explains the concepts behind two Linux filters and gives guidance on how to use the filters with the EMail software KMail.

Contents
How filters work
SpamAssassin
Quick Spam Filter (qsf)

How filters work
EMail software that collects emails from the internet is called an EMail client.

EMail sent to you by people you with whom you have no sort of relationship and which is not requested, is referred to as "spam". For those who want to rid themselves of spam, there is more than one ways to do so. Different methods have varying levels of success.

Simple spam procedures eliminate emails that have certain known "bad" words (such as "free offer" or "viagra"), but they miss misspelt words (such as "f.r.e.e offers" or "v i a g r a") and words not identified as bad. The systems that identify a huge number of bad words also run the risk of eliminating valid emails that happen to contain one of the words on the bad word list.

The more intelligent systems have a variety of ways to identify spam, often referred to as "rules" with a programmed ability to learn from experience. Both SpamAssassin and Quick Spam Filters featured in this article are examples of software with this intelligent capability.

When an email is collected by your email software (the email client) from the internet (the email server), the email is "passed" to the filter software which checks each email against its known/learnt spam rules. There are several ways these software can be used. The one we recommend identifies emails as spam by adding some text to the email header. In this way, the email message remains in tact - useful where the email software incorrectly identified a valid email as spam. Once email has been identified as spam, albeit in a relatively invisible way to you, spam emails can be spirited away to their own folder, leaving you to read your "genuine" emails in peace.
(Back to contents)

SpamAssassin

All images in this section can be seen in perfect resolution by clicking on the image.

SpamAssassin can be found at spamassassin.rediris.es.

Firstly, launch KMail and set up folders into which you can separate out spam email. In KMail, set up three email folders, called something like "FilteredSpam", "MissedSpam" and "NonSpam". You can achieve this in KMail by right-clicking any folder (that you want to contain the new folders) and selecting the "Create Child Folder" option.

Copy (or move) as many spam emails as you can to the "MissedSpam" folder, which will soon become the basis on which SpamAssassin will learn which emails you consider to be spam.
Next, set up the system that will call SpamAssassin for each new email collected from the server. This is achieved by using the KMail filter system - two filters are required.

In KMail, click "Settings","Configures Filters". To create a new filter, click on the funny looking "new" button just above the "help" button. Rename it to something like "SpamAssassin". In this filter, set the following options.

In the Filter Criteria, click "Match all of the following". Set the first filter rule so that "<size>" "is less than" "250000". (This tells KMail to apply the following rule to any message that is smaller than 250KB in size. The larger the size of the email, the longer SpamAssassin takes to vet it. This size limit is a suggestion that you may want to play with once you have the system working). Set the first "Filter Action" to "pipe through" and in the text box, type either "spamc" or "spamassassin". (This tells email to pipe each message that meets the Filter Rule above through the program using either the spamassassin or spamc command - see below for guidance about which will be appropriate to you. The program will add the line "X-Spam-Status: YES" to the email header where it identifies the email as a spam, or "X-Spam-Status: NO" otherwise). Finally, uncheck the option "if this filter matches, stop processing". (If left checked, the emails will not flow through to the next stage, which may make the entire process largely redundant). Click the "apply" button when ready.

Add a second filter. This time, rename it to something like "SpamFilter". Set the Criteria settings to "match any of the following", "any header" "contains" "X-Spam-Status: YES". Set the Actions settings to "move to folder" "FilteredSpam". Uncheck the "if this filter matches, stop processing". Again, click the "apply" button when ready.

KMail is now ready to apply SpamAssassin. Here are a couple of notes about SpamAssassin itself.
  • The installation process is well documented at the SpamAssassin web site (spamassassin.rediris.es). When installed, run through the software's learning process using the command-line command "sa-learn --mbox --spam /home/UserName/Mail/.spam.directory/MissedSpam/*". This assumes you have set up your folders as an "mbox" folder. If your folder is of type "maildir", use instead the command "sa-learn --spam --dir /home/UserName/Mail/.spam.directory/MissedSpam/*". (Note the KMail specific organising of its folders. If you have a folder in KMail called "spam", which containts child folders, they can be found, physically, below the "Mail" directory within a directory name ".spam.directory". The word "spam" will be replaced with whatever name the folder is called. You can tell which format the folder is in by right clicking the folder in KMail, and looking at the properties via the "properties" option). You will probably want to run this command frequently, to continue qsf's training well into the future, so you may want to keep it handy.

  • You also want to teach SpamAssassin what emails you do like to receive. Make sure none of the folders used in the training contain spam. When you are ready, use on each of the folders the command-line command "sa-learn --mbox --ham /home/UserName/Mail/.spam.directory/NonSpam" (or use the --dir format above if not using mbox format). Of course, you can use existing folders instead of the "/Spam/NonSpam" folder in this example. SpanAssassin recommends that several hundred emails are included in the training directories to provide a solid base for filtering.

  • SpamAssassin comes with a base set of rules. According to people who have used the software without any of the "training" above, the software still works very well. So training is by no means mandatory.

  • Filter software is run against every email that comes in. Needless to say, the filtering can take some time. To accommodate those of use for whom this is an issue, SpamAssassin can run in either of two modes. The straight-forward mode launches the program every time an email is checked. In this mode, just use the command "spamassassin" in the relevant filter above.

  • The other mode is the "daemon" mode. In this mode, SpamAssassin is running all the time, as a daemon, waiting for the next call. Within the installation process, a directory appears below the installation directory called "spamd". This directory contains a file called "spamc". This is a pre-compiled program. It is the client program that hooks up with a running daemon. Move (or copy) this executable file somewhere your email program can see it. This program will only work when the daemon is running. If this is for you, use the command "spamc" in the Filter Rules above.

    To run the daemon, copy the file "suse-rc-script.sh" in the spamd directory (or the red hat version if appropriate) to somewhere like the /etc/init.d directory and follow the instructions in the shell command. If you a shell is not preconfigured for you, it should not be too difficult to adapt one of the existing shells to your own system. To get the daemon running, use the command-line command "/etc/init.d/spamd start" (adapted to suit the directory and filename you copied the shell command to).

  • To check out the software, just collect your email in the normal way. If you are still unsure whether things are working as expected, you can always click on a known spam and run the filter manually. To do so, make sure the spam is highlighted in KMail and click "Message", "Apply filters". This runs the filters on the selected message(s). Look at the headers (click on the message and click "View", "All headers") to see if there is a header of "X-Spam-Status: YES" (or "NO"). If so, you know for certain that the software is properly running.

  • When running the software, one tip is to move all spam email that was "passed" as non-spam to a folder called "MissedSpam", to run the "sa-learn --spam..." command on this folder, and to delete it when finished. Similarly, a quick scan of the "FilteredSpam" folder will help you pick up if legitimate emails are getting filtered out, in which case you need to use the "sa-learn --ham ..." command to prevent future errors in future.
(Back to contents)

Quick Spam Filter (qsf)

All images in this section can be seen in perfect resolution by clicking on the image.

qsf can be found at www.ivarch.com.

Firstly, launch KMail and set up folders into which you can separate out spam email. In KMail, set up three email folders, called something like "FilteredSpam", "MissedSpam" and "NonSpam". You can achieve this in KMail by right-clicking any folder (that you want to contain the new folders) and selecting the "Create Child Folder" option. For qsf, the folders must have a MailBox format of type "mbox". This is one of the options available when creating a new folder. If you have missed this point or have MailBox format "maildir", create new directories with the "mbox" format, move emails from the old to the new directories and delete the old directories.

Copy (or move) as many spam emails as you can to the "MissedSpam" folder, which will soon become the basis on which qsf will learn which emails you consider to be spam.
Next, set up the system that will call qsf for each new email collected from the server. This is achieved by using the KMail filter system - two filters are required.

In KMail, click "Settings","Configures Filters". To create a new filter, click on the funny looking "new" button just above the "help" button. Rename it to something like "QuickSpamFilter". In this filter, set the following options.

In the Filter Criteria, click "Match all of the following". Set the first filter rule so that "<size>" "is less than" "250000". (This tells KMail to apply the following rule to any message that is smaller than 250KB in size. The larger the size of the email, the longer qsf takes to vet it. This size limit is a suggestion that you may want to play with once you have the system working, up to the system's upper limit of 512KB). Set the first "Filter Action" to "pipe through" and in the text box, type "qsf". The program will add the line "X-Spam: YES" to the email header where it identifies the email as a spam, or "X-Spam: NO" otherwise. (Note, the message may be "X-Spam-Flag: YES" in some versions of the software). Finally, uncheck the option "if this filter matches, stop processing". (If left checked, the emails will not flow through to the next stage, which may make the entire process largely redundant). Click the "apply" button when ready.

Add a second filter. This time, rename it to something like "SpamFilter". Set the Criteria settings to "match any of the following", "any header" "contains" "X-Spam: YES". Set the Actions settings to "move to folder" "FilteredSpam". Uncheck the "if this filter matches, stop processing". Again, click the "apply" button when ready.

KMail can now be used to filter out spam. There are two other filters that you may find helpful. One updates qsf, letting it add the missed spam to its database. The other lets qsf know to remove email it had incorrectly added to its "spam" database.

Add a third filter, renaming it to something like "Mark as Non-Spam". This filter removes an email incorrectly identified as spam. Set the Criteria with "match any of the following", "any header", "contains", "X-Spam: YES". Set the Actions setting to "pipe through", with a value of "qsf -M -a". You want to add another Action, which you can do by clicking the "More" button, just at the end of the Action section. This adds a new line which should contain something like "move to folder", "inbox" (if you want to move the incorrectly identified spam back to your inbox). Under the Advanced option, uncheck the "apply to incoming messages" and "apply to sent messages" and check the "on manual filtering" option. See the notes after the next paragraph for instruction on how to apply this filter.

Add the fourth filter, renaming it to something like "Mark as Spam". This filter adds a spam email that was missed by qsf. Set the Criteria with "match any of the following", "any header", "contains", "X-Spam: NO". Set the Actions setting to "pipe through", with a value of "qsf -m -a". You want to add another Action, which you can do by clicking the "More" button, just at the end of the Action section. This adds a new line which should contain something like "move to folder", "trash" (if you want to move the spam email to trash). Under the Advanced option, uncheck the "apply to incoming messages" and "apply to sent messages" and check the "on manual filtering" option.

Filters are applied by highlighting the message(s) you want to be filtered, then click the menu bar item "Message", then the "Apply filter" option (or Control-J for short). Say, for example, you had identified some spam that should not really be spam. If you followed the above suggestion, you would have a folder called "Incorrectly marked". You would typically run through the "FilteredSpam" folder, moving the incorrectly marked spam to the "IncorrectlyMarked" folder. At the end of the process, you would highlight every message in this folder and click the "Apply filter" option. Similarly, for messages in your inbox that were not identified as spam, you might move them to a "MissedSpam" box. Highlight all these messages and repeat the "Apply filter" instruction.

KMail is now ready to apply qsf. Here are a couple of notes about qsf itself.
  • The installation process is well documented at the qsf web site (www.ivarch.com). When installed, run through the software's learning process using the command-line command "qsf -T /home/UserName/Mail/.spam.directory/MissedSpam /home/UserName/Mail/.spam.directory/NonSpam". This assumes you have set up your folders as an "mbox" folder. If your directory is of type "maildir", see above for a fix. (Note the KMail specific organising of its folders. If you have a folder in KMail called "spam", which containts child folders, they can be found, physically, below the "Mail" directory within a directory name ".spam.directory". The word "spam" will be replaced with whatever name the folder is called. You can tell which format the folder is in by right clicking the folder in KMail, and looking at the properties via the "properties" option.) You will probably want to run this command frequently, to continue qsf's training well into the future, so you may want to keep it handy.

  • The one command-line command teaches qsf both the emails you do not like to receive and the ones you do. To be effective, qsf recommend that you copy a minimum of 75 spam emails and 300 non-spam emails to the relevant directories.

  • To check out the software, just collect your email in the normal way. If you are still unsure whether things are working as expected, you can always click on a known spam and run the filter manually. To do so, make sure the spam is highlighted in KMail and click "Message", "Apply filters". This runs the filters on the selected message(s). Look at the headers (click on the message and click "View", "All headers") to see if there is a header of "X-Spam: YES" (or "NO"). If so, you know for certain that the software is properly running.

  • When running the software, one tip is to move all spam email that was "passed" as non-spam to a folder called "MissedSpam", to run the "qsf -T ... " command on this folder, and to delete it when finished. Similarly, a quicky scan of the "FilteredSpam" folder will help you pick up if legitimate emails are getting filtered out, in which case you, the qsf documentation provides guidance on how to "unteach" invalid rules.
(Back to contents)