[Xapian-discuss] Xapian based spam filter using Bayesian algorithm.
Kevin Duraj
kevin.softdev at gmail.com
Wed Aug 1 22:09:44 BST 2007
Hi,
I am building Xapian based spam filter using Bayesian algorithm.
Building two separate search engines for spam and ham corpus that can
efficiently determine whether the message is spam or ham. Let me know
if there is some spam filter implementation using Xapian, thanks.
Bayesian algorithm ...
p = Probability of term
s = Number of occurrences in Spam Corpus
m = Number of messages in Spam Corpus
h = Number of occurrences in Ham Corpus
n = Number of messages in Ham Corpus
(s / m)
p = -----------------------------
( (s / m) + ( (h * 2) / n ) )
--
Cheers,
Kevin Duraj
http://pacificair.com
More information about the Xapian-discuss
mailing list