Nov 17, 2003, 09:22 AM
Link: SpamSieve 2.0.2, an effective spam filter for OS X email clients. (http://www.geekpatrol.ca/archives/2003/11/17/spamsieve202.php)
Posted on MacBytes.com (http://www.macbytes.com)
Approved by Mudbug
Nov 17, 2003, 10:07 AM
This Bayesian business is good, but it needs a major overhaul.
For example, I've received 192 spams in the last 4 days, and Mail caught about, oh, 60% of them. I've been using Mail (not in training mode, but Junking any missed spams) for about 2 or 3 months now.
What appears to be happening is that the Bayesian algorithm (at least used by Apple) works fine if the spelling is consistent, but it ignores the ability for spammers to alter the spelling to fool even the Bayesian algorithm.
Observe the following actual spellings:
via gr a
So what's happening here? Clearly these are all "new" words to the filter and receive no negative or positive weight, when clearly they should.
I believe if a Spam passes the built in Bayes filter, a second step (or a modification to Bayes itself) would be to filter out non-letter characters, de-hax0r them (makes that '1' back into an 'i') and checks again, you could get better matching.
A short term solution could be an Applescript run on every non-Familiar piece of e-mail that simply removes all non-letters (including spaces) and searches for a list of words which are known to be from spam. No one I know would tell me about Viagra or Penis.
I've been considering writing such an Applescript, but i haven't done Applescrip0t since the OS7 days.
I already have an Applescript running against my unfamiliars (SpamHolio v0.7) which checks against RBL's, but all it really seems to do is jam up Mail while it's running. It doesn't appear to have decreased my spam too much.
So anyways, I get way more spam than legit e-mail and I can't switch e-mail addresses. I've had this one for too long.