[Xapian-discuss] Xapian based spam filter using Bayesian algorithm.

Kevin Duraj kevin.softdev at gmail.com
Wed Aug 1 22:09:44 BST 2007


Hi,

I am building Xapian based spam filter using Bayesian algorithm.
Building two separate search engines for spam and ham corpus that can
efficiently determine whether the message is spam or ham. Let me know
if there is some spam filter implementation using Xapian, thanks.

Bayesian algorithm ...

p = Probability of term
s = Number of occurrences in Spam Corpus
m = Number of messages in Spam Corpus
h = Number of occurrences in Ham Corpus
n = Number of messages in Ham Corpus

                 (s / m)
  p = -----------------------------
      ( (s / m) + ( (h * 2) / n ) )


-- 
Cheers,
   Kevin Duraj
   http://pacificair.com



More information about the Xapian-discuss mailing list