[Xapian-discuss] Making SORTAFTER useful in omega?

Thu Oct 2 11:24:24 BST 2008

On 30-9-2008 6:19 Olly Betts wrote:
> 
> So splitting the match set into bands by percentage is rather arbitrary
> to start with.

Agreed, but then again the notion of 'equally relevant matches' is also 
very hard to describe.

> (Interestingly, explicitly reporting a score for each match seems to
> have fallen out of favour, perhaps due to the popularity of Google
> which doesn't provide them.)
> 
> Anecdotally, I was asked about sort bands by a couple of people before
> we removed it, and both times they found it a bit of an odd feature
> when I explained how it worked.
> 
> Unfortunately, we don't really have a replacement for combining
> relevance and sort key rankings in 1.0.x.  The nearest is probably to
> set up the weighting scheme parameters to produce less variation it
> weights (for BM25Weight: k2 = 0 and b = 0; for TradWeight: k = 0).

I tried that, but with 100 results afaik there were only a few documents 
sorted differently when switching from descending to ascending ordering 
when using tradweight with k = 0.

So that's why I came up with the idea to round those scores, to increase 
the change of documents being sorted prior to a similar scoring but 
older document.

> In trunk you can use PostingSource to apply an extra weight to each
> document depending on the sort key.  This is added to the relevance
> weight, which avoids the odd effects with sort bands across vs within
> the bands.  The matcher has a bound for the extra weight, so this is
> also pretty efficient.  For some more information, see:
> 
> http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst

I'm not really sure if I understand how this works. But increasing the 
weight of a newer result just because its newer may not be what a user 
expects either. So you should at least use it with relatively small 
increments, so the document is only bumped slightly above more or less 
similar results.

>> So what I did is just round off the weight, so the change of collapses 
>> increases. Rather than getting 14,3576 and 14,3623 you can get 14,36 (or 
>> 14,4 or 14). So the secondary sorting will decide which of those two 
>> documents gets shown first.
>> It isn't perfect, sometimes the maximum weight is somewhere below 1, 
>> sometimes its far over 20 orso, so the change of collapses still depends 
>> on the weight-distribution, but for documents that are close in weight, 
>> its much more common to collapse with my patch, than without it. You can 
>> see in your result set whether there is a high probability of this patch 
>> being helpful or not; if the top 100 results are all within 90-100 its 
>> going to give you a few collapses, if the top is much more distributed, 
>> it won't do much.
>>
>> I'd hoped someone on the list would have some ideas on how to make it a 
>> bit more predictable.
> 
> I guess you could choose the scaling based on the maximum possible
> weight (which is known before the matcher starts) but this approach
> seems to suffer from the same sort of arbitrariness that sort bands did.

I think your postingsource does too? Altough both approaches not as bad 
as the sort-bands.

Do you have plans to tackle the underlying issue anytime soon?

Best regards,

Arjen