Microblog: A very long article Wikipedia article on the orientation of toilet paper [Jun 7th, 22:52] [R]

Tuesday, September 9th, 2008

Spam Blocking Statistics

Categories: [ Blog ]

Out of the last 6370 spams, 5798 (91.0%) were blocked based on the IP address of the sender (IPBLACKLIST).

Out of the other 572, 394 (68.9%) were blocked by a simple trap (COIN, a field that should be left empty), 83 (14.5%) were blocked because they contained the same URL more than twice (SAMEURL), 49 (8.57%) had too many urls per word (TOOMANYURL), 15 (2.62%) were blocked by keyword (KEYWORD), 7 (1.22%) had the same values for title, blog name and excerpt (SAMETITLE), 5 (0.874%) had more than 4 URLs pointing to the same server (SAMESERVER), 3 (0.524%) contained random data (RANDOM, none of them actually did but they were spam nonetheless) and 2 (0.350%) contained only hex data (HEXDATA).

Overall, 14 spams had to be hand moderated, which makes a false negative percentage of 0.22%.

The false positives I've had were because of the TOOMANYURL filter, but it also catches a lot of spams. In most of these, the URLs were not real but made of random letters.

[ Posted on September 9th, 2008 at 13:02 | no comment | ]