[Xapian-devel] Custom weight factors - pushing the relevancy ranking how we want it

Olly Betts olly at survex.com
Fri Dec 17 11:11:53 GMT 2004


On Fri, Dec 17, 2004 at 10:28:42AM +0000, James Aylett wrote:
> There's currently no way of using document values (pieces of
> information stored about the document) in the mix to calculate weights

The match bias code allows this, or will once it's finished.

> I don't know enough about the internals of the matcher to know what a
> performance hit adding this kind of support would be.

Performance should be good - you specify the maximum weight that the
bias can return, and the matcher uses that just like it uses the maximum
weight a term can return.

> The other thing is that you make have luck with trying to
> automatically segment your top results. Say you grab the first 20, you
> could then see how similar these results are. One way of doing this
> that might work (but Olly or Richard will be able to give you a better
> answer :-) would be to get the ESet for the query with the RSet as
> each document in the MSet in turn, throwing the terms from the ESet
> back into the query and seeing which other documents from the original
> MSet come out of that new query. That should enable you to group
> related results to some extent, although it will depend on how your
> topics work to some extent.

That's going to be quite slow though, which is a problem for a realtime
search over a large database.

I'd suggest something simpler than that.  If you have XTdvd and similar
terms for when users have marked a topic as a good result for a query,
just mark the top few documents as relevant, and generate an ESet of
terms with prefix "XT".  Then if I search for "windows" you can offer
a side bar with a list of "refined queries" such as 'windows xp',
'windows nt', 'windows double glazing', etc.

Cheers,
    Olly




More information about the Xapian-devel mailing list