I'm currently working on a text classification system. This is requiring a fair amount of research and background reading so I'm going to create a list of references that I'm using:
A Plan for Spam by Paul Graham
Better Bayesian Filtering by Paul Graham
Bayesian Filtering: Beyond Binary Classification [PDF] by Ben Kamens
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification by Jonathan Zdziarski
CRM114 Discriminator by Bill Yerazunis
A free online Bayesian Classification service that I recently found and tried is called uClassify. I found that it was remarkably accurate and contacted the owner and exchanged some emails with him. Unfortunately he uses a proprietary data store that he bundles as part of the commercial package that he sells which makes his product unscalable and impossible to fail-over. Hopefully one day he'll move the datastore so something like SQL Server to make this product more usable by more people.
Here are some links I've accumulated:
ReplyDeletedelicious.com/.../bayesian
Also:
www.codeproject.com/.../BayesianCS.aspx
Very cool - thanks Bill.
ReplyDelete