View Single Post
Old 16 Feb 2007, 12:20 PM   #1
robmueller
Intergalactic Postmaster
 
Join Date: Oct 2001
Location: Melbourne, Australia
Posts: 6,102

Representative of:
Fastmail.FM
Recent spam checking summary

A while back we installed FuzzyOcr, a SpamAssassin plugin that used OCR programs to try and detect the surge in image spam that was going on. That worked fine for a while, but spammers changed to using obfuscated images which weren't easily readable by OCR systems. However it didn't matter that much, because:

a) Most of the new spam machines started being listed on RBL lists, so they would get spam scores regardless of the image analysis
b) Some SA rules were added by the regular sa-update that gave a score to the general form of message with attached gif

It seems these two combined in most cases would get the spams over the 5 point threshold that's the default "Normal" level protection.

Now the problem is that the RBL stuff doesn't work nearly as well for people forwarding from another service to FM. Basically SpamAssassin will in quite a few cases only look at the network "edge" where the email came from to our system because you can't trust headers beyond that. In the case of forwarding services, that means the forwarding service itself is checked against RBLs, not very helpful. I made a change a while back to SA that tries to help with that by defining some "trusted" forwarding servers. If we find those in the headers, we scan back through them to the IP of the machine that entered that system. The current list of trusted systems is:

trusted_host nic.name
trusted_host infidels.org
trusted_host zoneedit.com
trusted_host pobox.com srs
trusted_host google.com srs
trusted_host livemail.co.uk
trusted_host hotmail.com
trusted_host yahoo.com
trusted_host outblaze.com
trusted_host mailsnare.com
trusted_host runbox.com
trusted_host gmx.net
trusted_host mxes.net srs
trusted_host iki.fi

Note that being a "trusted" system doesn't mean we don't spam check it, it just means that we parse back through the Received headers to find what server delivered the email to that service, rather than using that services IP. This improves RBL checks enormously because there's no reason for any of the above services to be on an RBL.

To avoid forgery issues, we look at the IP address the Received header shows the email came from, and do a reverse and forward DNS lookup to see that they match, and then see that it's a host within one of those domain names above.

This has helped with those forwarders, but of course not all forwarding systems people have setup. So I had a quick look if we could improve the image spam scanning. An hour of fiddling, and I found a set of transforms that does amazingly well on almost all the current obfuscated image spams out there. Check out this dir:

http://forum.robm.fastmail.fm/spam/

You can see some original, and some "fixed" versions. Feeding the fixed versions into OCR software usually gives some meaningful result.

I rolled this out to our spam scanning machines for all incoming email yesterday (note I haven't rolled it out to the machines that handle Pop Links yet), and I notice that overnight I got no image spams at all in my Junk Mail folder. Hopefully a trend that continues.

Anyway, this is all just FYI for people...

Rob
robmueller is offline   Reply With Quote