said: 
>to write several hundred of their own filters and then have to worry 
>about maintaining them as the spammers get smarter.  (On one of my email 
>accounts Junk Spy typically catches 100% of the spam; on my others which 
>I combine Junk Spy catches over 90%.) 
Currently, it appears I need 7 filters for spam.  I could have less, but I 
have to workaround a couple of ICE filter expression defects.  Those 
spammers are incredibly consistent.  I clear the Review for Delete folder 
about twice a day.  I just cleared it to see what the count was and there 
where approx 25 spams.  Any spam that gets left in the Inbox gets moved to 
the Unfiltered Spam folder.  There are 17 messages in there since the 1st 
of March.  I'll let you do the math.  You are a much better mathematician 
than I. 
>they create by reviewing spam messages.  There's a Bayesian movement to 
>have a massive group of volunteers send in their results, but this 
>becomes an incredible mess when the spammers of the world surreptitiously 
>join that same movement and start supplying their own results to the 
It's also, in my IMNSHO, the worst way to use a Bayesian filter.  To be 
effective, a Bayesian filter must be trained for the spam you receive and 
the corpus should be as small as possible. 
I use Polarbar for the Mr. KIA list because the spam level is extremely 
high these days.  The corpus is 22 bad messages and 4 good messages.  I'm 
actually a bit short on good messages and might decide to train the filter 
with a few more known good messages. What I'm doing now is a bit of an 
experiment.  I wanted to see how long it would take to achieve high 
accuracy starting from an empty corpus and just marking misindentified 
messages.  The marked messages were by definition not identified 
correctly. 
Currently, the list receives about 10 spams per day and maybe 1 good 
message a week. The list does need to be promoted. :-) I estimate the 
current accuracy is in the 80% range, but that will get better as good 
messages show up. 
Several folks on the Polarbar list tried to train their corpus with spam 
archives with expected not so good results. I suspect they all started 
over and are now achieving better results. 
One thing Polarbar needs, and will probably get eventually, is a tool to 
thin the corpus of words that are not contributing to the final result. 
>And I'm not knocking your custom filters, Steven.  But you should add 
>Junk Spy to your email chain and put its massive spam filter after your 
>own. 
I'll consider this if my filters start missing more than 1 message a day. 
>That's 200 junk mails a week I don't have to see -- or 10,000 a year.  
>The time savings is incredible and it requires _no_ work on my part. 
Oh, I agree.  The only maintenance my folders need at the moment, if you 
want to call it that, is I need to add new addresses to my address book, 
if the filters misidentify a message. 
>Care to share your MR/2 filters? 
Sure.  You are not the first to ask.  There's always a relatively 
up-to-date copy at: 
  http://home.earthlink.net/~steve53/mr2i/MyICEFilters.txt 
IAC, thanks.  You reminded me it was time to refresh the copy I maintain 
there. 
Look for the enabled filters that are tagged as spam filters. 
To give you an idea of how much maintenance this takes, these are the 
filter control files as of today: 
 mr2i             .flt       5,831 .a..  2-28-03 14:49:58 
 OKFromFields     .txt         429 .a..  2-22-03 14:53:02 
 OKToFields       .txt         288 .a..  3-05-03 16:03:50 
As you can see, I don't need to maintain them very often and they are not 
all that large.  If Nick would fix a few defects in this filter 
expressions, I could lose the whitelist files. 
IIRC, all I did the last time I changed mr2i.flt was delete a bunch of 
disabled filters.  The last major update to the spam filters was probably 
about 6 months ago when I completed the migration to my current approach.  
This was when I added the spam logging.  This made it rather easy to 
figure out which filters were doing all the work.  Over time, I was able 
to delete several filters that were not paying their way.  Even so, I 
never had more than 15 spam filters.  They were just not as effective. 
Steven 
--  
--------------------------------------------------------------------- 
"Steven Levine"   MR2/ICE 2.35 #10183 Warp4/FP15/14.085_W4 
www.scoug.com irc.webbnet.org #scoug (Wed 7pm PST) 
--------------------------------------------------------------------- 
===================================================== 
To unsubscribe from this list, send an email message 
to "steward@scoug.com". In the body of the message, 
put the command "unsubscribe scoug-help". 
For problems, contact the list owner at 
"rollin@scoug.com". 
===================================================== 
<< Previous Message << 
 >> Next Message >>
Return to [ 23 | 
March | 
2003 ]
The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA  92799-6904, USA
Copyright 2001 the Southern California OS/2 User Group.  ALL RIGHTS 
RESERVED. 
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International 
Business Machines Corporation.
All other trademarks remain the property of their respective owners.