[Xapian-tickets] [Xapian] #475: Multi XOR can return inconsistent results
Xapian
nobody at xapian.org
Tue May 11 15:29:29 BST 2010
#475: Multi XOR can return inconsistent results
---------------------+------------------------------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.2.1
Component: Matcher | Version:
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
---------------------+------------------------------------------------------
I've run the following query, on a database containing 1000 random
documents, in which each document contains a sequence of terms starting at
N1, and ending at Nx, where x is a random integer in the range 1 to 100
(inclusive). ie, in this database, if a document contains term Nx, it
also contains N(x-1), for all x > 1. All wdfs are 1.
{{{
Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10 OR
N28) XOR N22 XOR N44))
}}}
When I ask for only the top result (get_mset(0, 1)), a document which
contains all terms to N99 is returned.
When I ask for all the results (get_mset(0, 1000)), this document is not
returned.
I believe this is because, when asking for only the top result, the
postlist decays changing some of the XORs into AND_NOTs.
The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which is
to match if an odd number of the children match). Consider:
{{{
N99 XOR N44 XOR N22
}}}
N99 present implies N44 and N22 present, so a document containing N99 will
be returned.
However, this can decay to
{{{
(N99 XOR N44) AND_NOT N11
}}}
N99 present implies all the other terms are present, so the XOR part of
this never matches a document containing N99.
I think the upshot of this is that either:
- we should disable the decay of XOR to AND_NOT, or
- we should disallow more than 2 children being passed to an XOR query
constructor (and disallow redistribution of terms of XOR queries next to
each other in the tree), or
- we should implement a MultiXORPostList which handles the decay
correctly (though I've not thought hard about what that means).
--
Ticket URL: <http://trac.xapian.org/ticket/475>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list