[Xapian-discuss] Making SORTAFTER useful in omega?

Arjen van der Meijden acmmailing at tweakers.net
Sat Sep 27 13:44:32 BST 2008


Hi Jim,

That was what I wanted to do first. But the percentage is based on the 
maximum weight of all documents in the result set, and at the time of 
this sorter's work, the maximum isn't known yet. And to get that 
working, the patch would've  been quite a bit more complex.

So what I did is just round off the weight, so the change of collapses 
increases. Rather than getting 14,3576 and 14,3623 you can get 14,36 (or 
14,4 or 14). So the secondary sorting will decide which of those two 
documents gets shown first.
It isn't perfect, sometimes the maximum weight is somewhere below 1, 
sometimes its far over 20 orso, so the change of collapses still depends 
on the weight-distribution, but for documents that are close in weight, 
its much more common to collapse with my patch, than without it. You can 
see in your result set whether there is a high probability of this patch 
being helpful or not; if the top 100 results are all within 90-100 its 
going to give you a few collapses, if the top is much more distributed, 
it won't do much.

I'd hoped someone on the list would have some ideas on how to make it a 
bit more predictable.

Best regards,

Arjen

On 27-9-2008 14:22 Jim wrote:
> Arjen van der Meijden wrote:
>> I've patched xapian-core to contain another Sorter, which takes the 
>> calculated weight, then rounds it (actually just multiplies it and 
>> casts it to int) and uses it as part the first part of a sort key. The 
>> second part is simply the value's content for a specific document 
>> which you'd would already be used with the 
>> sort_by_relevance_then_value-call.
>>
>> To give the sorter access to the weight, I added it to the 
>> operator()-call.
>>
>> You can enable my RoundedWeightSorter from within omega using the 
>> 'SORTFACTOR='-parameter on top of the normal SORTAFTER= and SORT=
>>
>> From my first looks and limited tests a reasonable value seems to be 
>> to round the weight to 2 decimals after the dot (sortfactor of 100). 
>> The time to do the search seems to be similar to a normal value based 
>> sorted search.
>>
>> The patches are based on Xapian/Omega 1.0.8 and obviously I'd like to 
>> hear about all flaws in my approach.
>>
>> Best regards,
>>
>> Arjen
> Arjen, thanks, I like that.  I had a client want something like that 
> last year.  I'll see if he wants me to modify the exisisting search to 
> include it.  Probably not since he's a bit tight on budgets right now.  
> He wanted a way to factor date into the sort order.  If I understand 
> what you have done, results with relevancy in the 90-100 bucket  could 
> be grouped and then sorted on something like date.  Same for 80-89, 
> etc.  Did I understand that correctly?
> 
> Thanks,
> Jim.
> 



More information about the Xapian-discuss mailing list