[Xapian-tickets] [Xapian] #475: Multi XOR can return inconsistent results
Xapian
nobody at xapian.org
Wed May 12 23:37:56 BST 2010
#475: Multi XOR can return inconsistent results
---------------------+------------------------------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.2.1
Component: Matcher | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
---------------------+------------------------------------------------------
Old description:
> I've run the following query, on a database containing 1000 random
> documents, in which each document contains a sequence of terms starting
> at N1, and ending at Nx, where x is a random integer in the range 1 to
> 100 (inclusive). ie, in this database, if a document contains term Nx,
> it also contains N(x-1), for all x > 1. All wdfs are 1.
>
> {{{
> Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10
> OR N28) XOR N22 XOR N44))
> }}}
>
> When I ask for only the top result (get_mset(0, 1)), a document which
> contains all terms to N99 is returned.
>
> When I ask for all the results (get_mset(0, 1000)), this document is not
> returned.
>
> I believe this is because, when asking for only the top result, the
> postlist decays changing some of the XORs into AND_NOTs.
>
> The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which
> is to match if an odd number of the children match). Consider:
>
> {{{
> N99 XOR N44 XOR N22
> }}}
>
> N99 present implies N44 and N22 present, so a document containing N99
> will be returned.
>
> However, this can decay to
>
> {{{
> (N99 XOR N44) AND_NOT N11
> }}}
>
> N99 present implies all the other terms are present, so the XOR part of
> this never matches a document containing N99.
>
> I think the upshot of this is that either:
>
> - we should disable the decay of XOR to AND_NOT, or
> - we should disallow more than 2 children being passed to an XOR query
> constructor (and disallow redistribution of terms of XOR queries next to
> each other in the tree), or
> - we should implement a MultiXORPostList which handles the decay
> correctly (though I've not thought hard about what that means).
New description:
I've run the following query, on a database containing 1000 random
documents, in which each document contains a sequence of terms starting at
N1, and ending at Nx, where x is a random integer in the range 1 to 100
(inclusive). ie, in this database, if a document contains term Nx, it
also contains N(x-1), for all x > 1. All wdfs are 1.
{{{
Xapian::Query((N48:(wqf=2) XOR (N70 AND_NOT N42) XOR (N99 OR N68 OR N10 OR
N28) XOR N22 XOR N44))
}}}
When I ask for only the top result (get_mset(0, 1)), a document which
contains all terms to N99 is returned.
When I ask for all the results (get_mset(0, 1000)), this document is not
returned.
I believe this is because, when asking for only the top result, the
postlist decays changing some of the XORs into AND_NOTs.
The decay of XOR to AND_NOT changes the semantics of a multi-XOR (which is
to match if an odd number of the children match). Consider:
{{{
N99 XOR N44 XOR N22
}}}
N99 present implies N44 and N22 present, so a document containing N99 will
be returned.
However, this can decay to
{{{
(N99 XOR N44) AND_NOT N22
}}}
N99 present implies all the other terms are present, so the XOR part of
this never matches a document containing N99.
I think the upshot of this is that either:
- we should disable the decay of XOR to AND_NOT, or
- we should disallow more than 2 children being passed to an XOR query
constructor (and disallow redistribution of terms of XOR queries next to
each other in the tree), or
- we should implement a MultiXORPostList which handles the decay
correctly (though I've not thought hard about what that means).
--
Comment(by richard):
Corrected typo in description
--
Ticket URL: <http://trac.xapian.org/ticket/475#comment:2>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list