[Xapian-devel] adaptive query scoring

Richard Boulton richard at lemurconsulting.com
Tue May 16 17:43:07 BST 2006


On Tue, May 16, 2006 at 09:29:12AM -0700, Alexander Lind wrote:
> > You could use the RSet to achieve something like this by recording
> > which documents users like for which queries and setting an RSet from
> > that when there's a query for the same terms.  It would probably
> > make sense to use a second Xapian database to store the queries matching
> > each document click so you'd run a search on that to find what to set
> > the RSet as on the main database.
> >   
> Which approach do you think would be easier - and more importantly, give
> the least overhead?  It seems to me that adding adaptive-terms (or
> whatever would be a good term for these!) and just rewrite the queries
> and work on one xapian db only would mean less overhead (and less
> maintenance). What do you think?  Would you be able to be as versatile
> with the RSet approach, ie use the adjacent-word approach like you
> suggest below?

The two approaches will give quite different results: I personally suspect
that just adding entries to the RSet based on adaptive-terms won't give a
particularly useful improvement (unless you often have searches with a
large number of terms).  You'd probably have to do an autoexpand to add
terms to the query based on the RSet, rather than just rely on the ranking
from the RSet.

Certainly the easiest approach to implement would be adding click terms to
the main database based on users clicks; you might need to batch up the
modifications to the main database to be able to send them quickly enough,
though.

> Yep, this sounds workable.
> Does the ANDMAYBE operator add much overhead to queries?  Would it be
> faster to just use the OR operator?  If a result matches the XCLICK*
> term, it _must_ also match the original term.

In general, ANDMAYBE is likely to be more efficient than OR, because it
allows faster skipping through the posting lists (and hence less I/O).  In
this particular case, because of the condition you list, it should be
equivalent to OR.  I'd be very surprised if it was less efficient.




More information about the Xapian-devel mailing list