[Xapian-devel] Weighting schemes for Xapian

Olly Betts olly at survex.com
Tue Mar 31 06:44:27 BST 2015


Hi Richhiey,

On Sat, Mar 28, 2015 at 08:40:40PM +0530, Richhiey Thomas wrote:
> I would like to work on LDA based document modelling and Heimstra's
> language modelling and would like to form a concrete plan on how to proceed.
> It would be really helpful if I could have a mentor to assist me with this.
> Looking forwards to your reply.

In addition to what James said in his reply, a key question to resolve
is how this fits with Xapian's model of how a weighting scheme works.

Xapian expects you to express the score a document should get as the
sum of non-negative contributions from each term which is in both the
document and the query, plus a non-negative contribution which doesn't
depend on a particular query term.  So far we've managed to do that
for all the schemes we've considered, though with a few tricks needed
in some cases.

To get good performance, you also want to provide a tight bound on
the contribution from a particular term in any document.  This allows
Xapian's matcher to perform various optimisations like terminating
early, or switching OR to AND because it can prove the subqueries
must all match.

If you can't make the weighting fit that model, then you'll need to do
the weighting in a different way.  I'm not really sure I can usefully
suggest how without understanding LDA or Hiemstra's LM better than I
currently do.

Cheers,
    Olly



More information about the Xapian-devel mailing list