[Xapian-devel] Custom weight factors - pushing the relevancy ranking how we want it
olly at survex.com
Fri Dec 17 11:11:53 GMT 2004
On Fri, Dec 17, 2004 at 10:28:42AM +0000, James Aylett wrote:
> There's currently no way of using document values (pieces of
> information stored about the document) in the mix to calculate weights
The match bias code allows this, or will once it's finished.
> I don't know enough about the internals of the matcher to know what a
> performance hit adding this kind of support would be.
Performance should be good - you specify the maximum weight that the
bias can return, and the matcher uses that just like it uses the maximum
weight a term can return.
> The other thing is that you make have luck with trying to
> automatically segment your top results. Say you grab the first 20, you
> could then see how similar these results are. One way of doing this
> that might work (but Olly or Richard will be able to give you a better
> answer :-) would be to get the ESet for the query with the RSet as
> each document in the MSet in turn, throwing the terms from the ESet
> back into the query and seeing which other documents from the original
> MSet come out of that new query. That should enable you to group
> related results to some extent, although it will depend on how your
> topics work to some extent.
That's going to be quite slow though, which is a problem for a realtime
search over a large database.
I'd suggest something simpler than that. If you have XTdvd and similar
terms for when users have marked a topic as a good result for a query,
just mark the top few documents as relevant, and generate an ESet of
terms with prefix "XT". Then if I search for "windows" you can offer
a side bar with a list of "refined queries" such as 'windows xp',
'windows nt', 'windows double glazing', etc.
More information about the Xapian-devel