[Xapian-discuss] Feature requests

Markus Wörle mrks at mrks.de
Wed Feb 21 12:23:11 GMT 2007


Am 20.02.2007 um 20:01 schrieb Olly Betts:

> On Tue, Feb 20, 2007 at 07:07:45PM +0100, Markus W?rle wrote:
>> * I use sorting by_value_then_relevance in some cases. In this
>> condition it happens that if both value and relevance of documents
>> are equal, the sorting between those documents becomes unstable, that
>> is, the order of those documents may differ from query to query.
>
> If so, that is a bug.  The final ordering is by docid (or reverse  
> docid)
> if all else is equal.

Thats interesting. I just tried to reproduce the described behaviour,  
but i wasn't able to reproduce it in a small example so far. I  
actually found a faulty behaviour during my tests, but I am not  
really sure, if it is the same bug causing both problems. Here is a  
little perl script which demonstrates this:

http://5nord.org/~mrks/xapian_sortorder1.pl

This program populates a fresh database with 10 documents, 5 with  
term "test" (termcount:1) and values 0-4, and 5 with term  
"test" (termcount:2) and also values 0-4. Then it performs some  
searches on this index, all with the same query ("test"). The first  
search requests 10 results, starting from 0. The following searches  
produce the same result, but requests first 1 result starting from 0,  
1 result starting from 1, and 8 results starting from 2. Both methods  
should give the same output.

If you comment out the set_sort_by_value_then_relevance() in line 34,  
this is actually the case, but with sorting by value first, the  
results differ. This is the output of my program:

Without sorting by value (line 34 commented out):

(unpaged)
ID 6 99%
ID 7 99%
ID 8 99%
ID 9 99%
ID 10 99%
ID 1 86%
ID 2 86%
ID 3 86%
ID 4 86%
ID 5 86%
(paged)
ID 6 99%
ID 7 99%
ID 8 99%
ID 9 99%
ID 10 99%
ID 1 86%
ID 2 86%
ID 3 86%
ID 4 86%
ID 5 86%

And with sorting by value (line 34 untouched):

(paged)
ID 10 99%
ID 5 86%
ID 9 99%
ID 4 86%
ID 8 99%
ID 3 86%
ID 7 99%
ID 2 86%
ID 6 99%
ID 1 86%
(paged)
ID 5 100% <-- this is where it becomes strange
ID 5 86%   <-- the same doc_id twice, with different relevances
ID 9 99%
ID 4 86%
ID 8 99%
ID 3 86%
ID 7 99%
ID 2 86%
ID 6 99%
ID 1 86%

Thus, if  the final sort-order is given by the document id, there  
could be a coherence between this and the initial bug, which I was  
not able to reproduce in a small example so far.

Regards,
mrks




More information about the Xapian-discuss mailing list