[Xapian-devel] some improvements about the latent semantic search

Olly Betts olly at survex.com
Tue Oct 9 02:41:40 BST 2012


On Thu, Oct 04, 2012 at 11:48:13PM +0800, Jianping Wang wrote:
> Recently I invented a new ranking algorithm inspired by the theory of
> spread activation and probabilistic model, which can find the latent
> semantic relationship between docs and terms and is almost linear time, and
> I took one afternoon to code and implement this algorithm. And the testing
> result shows that the speed of this algorithm is much faster than the
> famous Latent Semantic Analysis algorithm, and the affect is almost as good
> as the LSA. I wanna share my idea to all of you and add this algorithm to
> the Xapian project.

Can you express your algorithm as a sum of a positive weight from each
matching term, optionally plus a per-document component?  That's a
requirement for it to be implementable within the Xapian matcher
framework.  If it doesn't fit into this form, you'll need to do a lot
more work to fit it into Xapian.

If the algorithm is a product of a contribution per term, then taking
the log may allow you to express it as such a sum.

To implement a new weighting scheme, you need to subclass Xapian::Weight
and implement several methods:

http://trac.xapian.org/browser/trunk/xapian-core/include/xapian/weight.h

Cheers,
    Olly



More information about the Xapian-devel mailing list