[sponsored links]

sa-analyze - SpamAssassin qmail filtering/logging

Version: 0.1
Date: 2002-03-29
Author: John Fitzgibbon
fitz@jfitz.com
http://www.jfitz.com


Copyright
Copyright 2002 by John Fitzgibbon, (fitz@jfitz.com). All rights reserved.
This package is distributed under the Perl Artistic License.
Please see the "License" file in the distribution package for details.

Introduction
sa-analyze is a package that can be used to help implement SpamAssassin filtering for organisations that use qmail as their primary SMTP server. These scripts work at the mail server level, not for each mail client, so care needs to be exercised to ensure that you are not violating your users' privacy rights.

Note, (2003-05-23): I've put together some spam-filtering perl scripts that do "SpamAssassin-like" filtering, (but without online blacklist checking). I find that they are faster and more accurate than SpamAssassin, and they are easier to update. You can download them here. The filtering scripts are compatible with sa-analyze, so you can use sa-analyze to generate HTML reports from your maillog, regardless of whether you choose to use my scripts or SpamAssassin to check for spam.

The sa-analyze package consists of 4 scripts:
  • qmail-localfilter.sh and qmail-localfilter.pl handle mail filtering and logging rejected mail.
  • sa-analyze-descriptions and sa-analyze-maillog produce html reports by analyzing the SpamAssassin signatures and the maillog entries.
These scripts were tested on a FreeBSD 4.5 system. Other systems have not been tested.

You can download the package here: sa-analyze-0.1.tgz

Requirements
Note: sa-analyze-maillog requires a GNU, (Linux), or FreeBSD style implementation of the "date" command in order to be able to analyze logs from previous days.

Installation
  • Install qmail with the QMAILQUEUE patch, (http://www.qmail.org/)
  • Install qmail-qfilter, (http://untroubled.org/qmail-qfilter/)
  • Install SpamAssassin, (http://spamassassin.taint.org/)
  • Copy qmail-localfilter.sh and qmail-localfilter.pl to the qmail "bin" directory, (normally /var/qmail/bin).
  • Copy sa-analyze-descriptions and sa-analyze-maillog to a directory in your $PATH, (/usr/local/bin might be a good choice).
  • Modify qmail's "rc" file, (normally /var/qmail/rc), so that the environment variable QMAILQUEUE specifies the full path to qmail-localfilter.sh.
  • Restart qmail.
For FreeBSD users:
Notes on the Install
For efficiency, qmail-localfilter.sh uses spamd/spamc instead of calling spamassassin directly. You will probably want to add a startup script to start spamd at boot time. The following 2 line script should do the trick:

#!/bin/sh
/usr/local/bin/spamd -F 0 -d

Before switching the qmail rc file to use qmail-localfilter.sh you should make sure that spamd / spamc work. A command like:

spamc < test.eml

...should print out the file test.eml, with an X-Spam-Status header that details how it fared in SpamAssassin's tests. If test.eml is spam, a full Spam status report will be included.

The qmail rc file should be modified to begin something like this:

#!/bin/sh

QMAILQUEUE=/var/qmail/bin/qmail-localfilter.sh
export QMAILQUEUE

etc...

There are other variables that qmail-localfilter.sh recognizes. You may need, (or like), to set some of these in the qmail rc script. The variables are:

QMAIL_BIN - Defines the path to qmail executables, (specifically qmail-qfilter). Default is /var/qmail/bin
QMAIL_LOCALFILTER_BIN - Defines the path to the qmail-localfilter scripts. Defaults to the same value as QMAIL_BIN
QMAIL_LOCALFILTER_FLAGS - Defines command line flags passed to qmail-localfilter.pl. There is no default
QMAIL_LOCALDOMAIN - Defines the local domain served by qmail. These is no default
SPAMC_FLAGS - Defines the command line flags passed to spamc. Default is -f -s 2048000

Notes on QMAIL_LOCALDOMAIN and QMAIL_LOCALFILTER_FLAGS:
Messages with a "From:" line containing addresses of the format "name@QMAIL_LOCALDOMAIN" will have the X-Spam-Status line removed, (if the mail is not spam). This prevents external sources from determining potential spam triggers by analyzing received mail. If QMAIL_LOCALDOMAIN is omitted, the X-Spam-Status line is ALWAYS removed from non-spam. This is the safest option, but it does mean your local users can't use the X-Spam-Status header to do further filtering.

QMAIL_LOCALDOMAIN is overridden by QMAIL_LOCALFILTER_FLAGS. If you are using QMAIL_LOCALFILTER_FLAGS and want to include a local domain, include "-l [local domain]" in the flag settings. The only other useful flag currently available is "-s facility". This specifes the syslog facility that qmail-localfilter.pl will write to. The default is "mail".

If you are having trouble with the configuration, qmail-localfilter.sh contains a commented-out line that directly calls qmail-queue. You can try uncommenting this line, (and commenting out the qmail-qfilter call), to ensure that unfiltered mail is working, (i.e. it will verify that qmail with the QMAILQUEUE patch is working okay).

Producing HTML logs
sa-analyze-descriptions produces a html file containing full descriptions of the short SpamAssassin codes.

sa-analyze-maillog looks at qmail's logs and produces html files containing lists of mail rejected by qmail-localfilter.pl. The SpamAssassin codes in the reject file are hyperlinked to the descriptions in the file produced by sa-analyze-descriptions, which allows you to easily cross-check scores and descriptions.

You can run either program with no command line parameters to produce html files in the current directory.

You can include a file or directory name as a command line parameter to output the html files in the specified location.

Typically, you might want to include a directory name that is in your webserver's path as the parameter.

sa-analyze-maillog can include a second parameter that specifies the number of days prior to today that you wish to analyze. (For example "1" would mean yesterday, "2" would mean the day before yesterday, etc.) This parameter is only good if you are cycling and gzipping mail logs on a daily basis, and is only good for the number of days that you keep logs, (typcially any value up to 10 should work).

You could update the html files automatically with crontab entries like these:

0 0 * * * /usr/local/bin/sa-analyze-descriptions [/path/to/webserver/directory]
1 0 * * * /usr/local/bin/sa-analyze-maillog [/path/to/webserver/directory] 1
0,15,30,45 * * * * /usr/local/bin/sa-analyze-maillog [/path/to/webserver/directory]
  • The first entry updates the descriptions file at midnight.
  • The second entry updates the spam file from the previous day's maillog at 1 minute past midnight.
  • The third entry updates the latest spam file every 15 minutes.
In addition to the command line parameters, the sa-analyze scripts can be customized by setting environment variables. Full descriptions follow:

sa-analyze-descriptions

Usage: sa-analyze-descriptions [file]

"file" specifies the output html file name.

If the output file is a directory, the output is written to the file
"saa-descriptions.html" in the specified directory.

The output file and other options can also be set as environment variables:

SAA_FILE - Output file. Default is saa-descriptions.html
SAA_CF_DIR - SpamAssassin .cf file directory. Default is /usr/local/share/spamassassin
SAA_HI_SCORE - Minimum score that triggers highlighting. Default is 3.000
SAA_POS_COL - HTML color code for positive highlighting. Default is #FF0000
SAA_POS_DESC - Name of the positive highlighting color. Default is red
SAA_NEG_COL - HTML color code for negative highlighting. Default is #00FF00
SAA_NEG_DESC - Name of the negative highlighting color. Default is green
SAA_TITLE - Report title. Default is SpamAssassin Signatures: Scores and Descriptions
SAA_DATE_FMT - Date formatting string for the" date" command. Default is %Y/%m/%d
SAA_TIME_FMT - Time formatting string for the" date" command. Default is %H:%M:%S
SAA_START - Comment to appear before the table data. There is no default
SAA_END - Comment to appear after the table data. There is no default
SAA_FONT_TAG - HTML <font> tag. Default is font face=arial size=-1
SAA_BODY_TAG - HTML <body> tag. Default is body bgcolor=#000000 text=#FFFFFF link=#00FF00 vlink=#00FF00 alink=#00FF00

sa-analyze-maillog

Usage: sa-analyze-maillog [file] [date offset]

"file" specifies the output html file name.

If the output file is a directory, the output is written to the file
"saa-maillog.20020328.html" in the specified directory.

"date offset" specifies an offset in days, (e.g. 1 for yesterday).
Note: Do not specify 0 for today. Today\'s log is the default.

The output file and other options can also be set as environment variables:

SAA_FILE - Output file. Default is saa-maillog.YYYYMMDD.html
SAA_DATE_OFF - Date offset, (in days). There is no default
SAA_DESC_FILE - File containing SpamAssassin signature descriptions. Default is saa-descriptions.html
SAA_LOG_DIR - Qmail log directory. Default is /var/log
SAA_LOG_FILE - Qmail log file name, (no path). Default is maillog
SAA_TITLE - Report title. Default is Rejected Spam for 2002/03/28
SAA_DATE_FMT - Date formatting string for the" date" command. Default is %Y/%m/%d
SAA_TIME_FMT - Time formatting string for the" date" command. Default is %H:%M:%S
SAA_START - Comment to appear before the data section. There is no default
SAA_END - Comment to appear after the data section. There is no default
SAA_HI_COL - HTML highlight color. Default is #00FF00
SAA_FONT_TAG - HTML <font> tag. Default is font face=arial size=-1
SAA_BODY_TAG - HTML <body> tag. Default is body bgcolor=#000000 text=#FFFFFF link=#00FF00 vlink=#00FF00 alink=#00FF00