[Xapian-discuss] Making SORTAFTER useful in omega?
Arjen van der Meijden
acmmailing at tweakers.net
Thu Oct 2 11:24:24 BST 2008
On 30-9-2008 6:19 Olly Betts wrote:
>
> So splitting the match set into bands by percentage is rather arbitrary
> to start with.
Agreed, but then again the notion of 'equally relevant matches' is also
very hard to describe.
> (Interestingly, explicitly reporting a score for each match seems to
> have fallen out of favour, perhaps due to the popularity of Google
> which doesn't provide them.)
>
> Anecdotally, I was asked about sort bands by a couple of people before
> we removed it, and both times they found it a bit of an odd feature
> when I explained how it worked.
>
> Unfortunately, we don't really have a replacement for combining
> relevance and sort key rankings in 1.0.x. The nearest is probably to
> set up the weighting scheme parameters to produce less variation it
> weights (for BM25Weight: k2 = 0 and b = 0; for TradWeight: k = 0).
I tried that, but with 100 results afaik there were only a few documents
sorted differently when switching from descending to ascending ordering
when using tradweight with k = 0.
So that's why I came up with the idea to round those scores, to increase
the change of documents being sorted prior to a similar scoring but
older document.
> In trunk you can use PostingSource to apply an extra weight to each
> document depending on the sort key. This is added to the relevance
> weight, which avoids the odd effects with sort bands across vs within
> the bands. The matcher has a bound for the extra weight, so this is
> also pretty efficient. For some more information, see:
>
> http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst
I'm not really sure if I understand how this works. But increasing the
weight of a newer result just because its newer may not be what a user
expects either. So you should at least use it with relatively small
increments, so the document is only bumped slightly above more or less
similar results.
>> So what I did is just round off the weight, so the change of collapses
>> increases. Rather than getting 14,3576 and 14,3623 you can get 14,36 (or
>> 14,4 or 14). So the secondary sorting will decide which of those two
>> documents gets shown first.
>> It isn't perfect, sometimes the maximum weight is somewhere below 1,
>> sometimes its far over 20 orso, so the change of collapses still depends
>> on the weight-distribution, but for documents that are close in weight,
>> its much more common to collapse with my patch, than without it. You can
>> see in your result set whether there is a high probability of this patch
>> being helpful or not; if the top 100 results are all within 90-100 its
>> going to give you a few collapses, if the top is much more distributed,
>> it won't do much.
>>
>> I'd hoped someone on the list would have some ideas on how to make it a
>> bit more predictable.
>
> I guess you could choose the scaling based on the maximum possible
> weight (which is known before the matcher starts) but this approach
> seems to suffer from the same sort of arbitrariness that sort bands did.
I think your postingsource does too? Altough both approaches not as bad
as the sort-bands.
Do you have plans to tackle the underlying issue anytime soon?
Best regards,
Arjen
More information about the Xapian-discuss
mailing list