[Xapian-tickets] [Xapian] #475: Multi XOR can return inconsistent results

Xapian nobody at xapian.org
Tue May 11 15:29:29 BST 2010


#475: Multi XOR can return inconsistent results
---------------------+------------------------------------------------------
 Reporter:  richard  |       Owner:  olly 
     Type:  defect   |      Status:  new  
 Priority:  normal   |   Milestone:  1.2.1
Component:  Matcher  |     Version:       
 Severity:  normal   |    Keywords:       
Blockedby:           |    Platform:  All  
 Blocking:           |  
---------------------+------------------------------------------------------
 I've run the following query, on a database containing 1000 random
 documents, in which each document contains a sequence of terms starting at
 N1, and ending at Nx, where x is a random integer in the range 1 to 100
 (inclusive).  ie, in this database, if a document contains term Nx, it
 also contains N(x-1), for all x > 1.  All wdfs are 1.

 {{{
 Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10 OR
 N28) XOR N22 XOR N44))
 }}}

 When I ask for only the top result (get_mset(0, 1)), a document which
 contains all terms to N99 is returned.

 When I ask for all the results (get_mset(0, 1000)), this document is not
 returned.

 I believe this is because, when asking for only the top result, the
 postlist decays changing some of the XORs into AND_NOTs.

 The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which is
 to match if an odd number of the children match).  Consider:

 {{{
 N99 XOR N44 XOR N22
 }}}

 N99 present implies N44 and N22 present, so a document containing N99 will
 be returned.

 However, this can decay to

 {{{
 (N99 XOR N44) AND_NOT N11
 }}}

 N99 present implies all the other terms are present, so the XOR part of
 this never matches a document containing N99.

 I think the upshot of this is that either:

  - we should disable the decay of XOR to AND_NOT, or
  - we should disallow more than 2 children being passed to an XOR query
 constructor (and disallow redistribution of terms of XOR queries next to
 each other in the tree), or
  - we should implement a MultiXORPostList which handles the decay
 correctly (though I've not thought hard about what that means).

-- 
Ticket URL: <http://trac.xapian.org/ticket/475>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list