Search code examples
emailspamspam-preventionspamassassin

Spamassassin: is bayesian learning working here?


I am trying to train a recently installed copy of Spamassassin, and I'm having the impression that bayesian learning isn't working.

First of all: yes, spamd is running with the --allow-tell option.

Now, I have a piece of spam. I first run it by Spamassassin and I get a given score:

[paulo@myserver ~]$ spamc -R < spam6.txt 
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Nombre - herbertrl1 E-mail: - [email protected]
   Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
   porn star carl paula blum porn double d hamster porn video oiled porn clitoris
   massage free young nubile porn [...] 

Content analysis details:   (2.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
              [Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: sexjanet.com]
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record

So I feed it to spamc using the -L option:

[paulo@myserver ~]$ spamc -L spam < spam6.txt
Message successfully un/learned

And then I try to analyze it with spamc again... and I get the exact same score:

[paulo@myserver ~]$ spamc -R < spam6.txt 
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Nombre - herbertrl1 E-mail: - [email protected]
   Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
   porn star carl paula blum porn double d hamster porn video oiled porn clitoris
   massage free young nubile porn [...] 

Content analysis details:   (2.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
              [Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: sexjanet.com]
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record

Am I missing something?


Solution

  • SpamAssasin : How much learning is needed for Bayes?

    Default spamassassin configuration requires minimum 200 spam and 200 ham messages to train bayes. You can execute sa-learn --dump magic to check number of messages passed to bayes learning.

    man Mail::SpamAssassin::Conf (SpamAssassin version 3.1)

    bayes_min_ham_num (Default: 200)
    bayes_min_spam_num (Default: 200)
    To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings

    $ sa-learn --dump magic
    […]
    0.000          0       2508          0  non-token data: nspam
    0.000          0        508          0  non-token data: nham
    […]