Tuesday, February 3, 2009

Text Classification References

I'm currently working on a text classification system. This is requiring a fair amount of research and background reading so I'm going to create a list of references that I'm using:

A Plan for Spam by Paul Graham

Better Bayesian Filtering by Paul Graham

Bayesian Filtering: Beyond Binary Classification [PDF] by Ben Kamens

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification by Jonathan Zdziarski

CRM114 Discriminator by Bill Yerazunis

A free online Bayesian Classification service that I recently found and tried is called uClassify. I found that it was remarkably accurate and contacted the owner and exchanged some emails with him. Unfortunately he uses a proprietary data store that he bundles as part of the commercial package that he sells which makes his product unscalable and impossible to fail-over. Hopefully one day he'll move the datastore so something like SQL Server to make this product more usable by more people.

 

2 comments:

  1. Here are some links I've accumulated:
    delicious.com/.../bayesian
    Also:
    www.codeproject.com/.../BayesianCS.aspx

    ReplyDelete
  2. Very cool - thanks Bill.

    ReplyDelete