[Xapian-tickets] [Xapian] #475: Multi XOR can return inconsistent results

Xapian nobody at xapian.org
Wed May 12 23:37:56 BST 2010


#475: Multi XOR can return inconsistent results
---------------------+------------------------------------------------------
 Reporter:  richard  |       Owner:  olly     
     Type:  defect   |      Status:  new      
 Priority:  normal   |   Milestone:  1.2.1    
Component:  Matcher  |     Version:  SVN trunk
 Severity:  normal   |    Keywords:           
Blockedby:           |    Platform:  All      
 Blocking:           |  
---------------------+------------------------------------------------------

Old description:

> I've run the following query, on a database containing 1000 random
> documents, in which each document contains a sequence of terms starting
> at N1, and ending at Nx, where x is a random integer in the range 1 to
> 100 (inclusive).  ie, in this database, if a document contains term Nx,
> it also contains N(x-1), for all x > 1.  All wdfs are 1.
>
> {{{
> Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10
> OR N28) XOR N22 XOR N44))
> }}}
>
> When I ask for only the top result (get_mset(0, 1)), a document which
> contains all terms to N99 is returned.
>
> When I ask for all the results (get_mset(0, 1000)), this document is not
> returned.
>
> I believe this is because, when asking for only the top result, the
> postlist decays changing some of the XORs into AND_NOTs.
>
> The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which
> is to match if an odd number of the children match).  Consider:
>
> {{{
> N99 XOR N44 XOR N22
> }}}
>
> N99 present implies N44 and N22 present, so a document containing N99
> will be returned.
>
> However, this can decay to
>
> {{{
> (N99 XOR N44) AND_NOT N11
> }}}
>
> N99 present implies all the other terms are present, so the XOR part of
> this never matches a document containing N99.
>
> I think the upshot of this is that either:
>
>  - we should disable the decay of XOR to AND_NOT, or
>  - we should disallow more than 2 children being passed to an XOR query
> constructor (and disallow redistribution of terms of XOR queries next to
> each other in the tree), or
>  - we should implement a MultiXORPostList which handles the decay
> correctly (though I've not thought hard about what that means).

New description:

 I've run the following query, on a database containing 1000 random
 documents, in which each document contains a sequence of terms starting at
 N1, and ending at Nx, where x is a random integer in the range 1 to 100
 (inclusive).  ie, in this database, if a document contains term Nx, it
 also contains N(x-1), for all x > 1.  All wdfs are 1.

 {{{
 Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10 OR
 N28) XOR N22 XOR N44))
 }}}

 When I ask for only the top result (get_mset(0, 1)), a document which
 contains all terms to N99 is returned.

 When I ask for all the results (get_mset(0, 1000)), this document is not
 returned.

 I believe this is because, when asking for only the top result, the
 postlist decays changing some of the XORs into AND_NOTs.

 The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which is
 to match if an odd number of the children match).  Consider:

 {{{
 N99 XOR N44 XOR N22
 }}}

 N99 present implies N44 and N22 present, so a document containing N99 will
 be returned.

 However, this can decay to

 {{{
 (N99 XOR N44) AND_NOT N22
 }}}

 N99 present implies all the other terms are present, so the XOR part of
 this never matches a document containing N99.

 I think the upshot of this is that either:

  - we should disable the decay of XOR to AND_NOT, or
  - we should disallow more than 2 children being passed to an XOR query
 constructor (and disallow redistribution of terms of XOR queries next to
 each other in the tree), or
  - we should implement a MultiXORPostList which handles the decay
 correctly (though I've not thought hard about what that means).

--

Comment(by richard):

 Corrected typo in description

-- 
Ticket URL: <http://trac.xapian.org/ticket/475#comment:2>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list