[Xapian-discuss] Feature requests
Markus Wörle
mrks at mrks.de
Wed Feb 21 12:23:11 GMT 2007
Am 20.02.2007 um 20:01 schrieb Olly Betts:
> On Tue, Feb 20, 2007 at 07:07:45PM +0100, Markus W?rle wrote:
>> * I use sorting by_value_then_relevance in some cases. In this
>> condition it happens that if both value and relevance of documents
>> are equal, the sorting between those documents becomes unstable, that
>> is, the order of those documents may differ from query to query.
>
> If so, that is a bug. The final ordering is by docid (or reverse
> docid)
> if all else is equal.
Thats interesting. I just tried to reproduce the described behaviour,
but i wasn't able to reproduce it in a small example so far. I
actually found a faulty behaviour during my tests, but I am not
really sure, if it is the same bug causing both problems. Here is a
little perl script which demonstrates this:
http://5nord.org/~mrks/xapian_sortorder1.pl
This program populates a fresh database with 10 documents, 5 with
term "test" (termcount:1) and values 0-4, and 5 with term
"test" (termcount:2) and also values 0-4. Then it performs some
searches on this index, all with the same query ("test"). The first
search requests 10 results, starting from 0. The following searches
produce the same result, but requests first 1 result starting from 0,
1 result starting from 1, and 8 results starting from 2. Both methods
should give the same output.
If you comment out the set_sort_by_value_then_relevance() in line 34,
this is actually the case, but with sorting by value first, the
results differ. This is the output of my program:
Without sorting by value (line 34 commented out):
(unpaged)
ID 6 99%
ID 7 99%
ID 8 99%
ID 9 99%
ID 10 99%
ID 1 86%
ID 2 86%
ID 3 86%
ID 4 86%
ID 5 86%
(paged)
ID 6 99%
ID 7 99%
ID 8 99%
ID 9 99%
ID 10 99%
ID 1 86%
ID 2 86%
ID 3 86%
ID 4 86%
ID 5 86%
And with sorting by value (line 34 untouched):
(paged)
ID 10 99%
ID 5 86%
ID 9 99%
ID 4 86%
ID 8 99%
ID 3 86%
ID 7 99%
ID 2 86%
ID 6 99%
ID 1 86%
(paged)
ID 5 100% <-- this is where it becomes strange
ID 5 86% <-- the same doc_id twice, with different relevances
ID 9 99%
ID 4 86%
ID 8 99%
ID 3 86%
ID 7 99%
ID 2 86%
ID 6 99%
ID 1 86%
Thus, if the final sort-order is given by the document id, there
could be a coherence between this and the initial bug, which I was
not able to reproduce in a small example so far.
Regards,
mrks
More information about the Xapian-discuss
mailing list