[Xapian-discuss] Buggy checkatleast?
Silviu-Ionut Ganceanu
silviug at gmail.com
Thu Oct 22 14:56:39 BST 2009
Hi Olly,
I was crazy enough to attempt to debug this by myself. I worked on version
1.0.16.
I think the problem is in the get_mset implementation from
xapian-core/matcher/multimatch.cc. What I understood by analyzing
xapian-logs and a couple of printfs inserted here and there:
1. matches_*_bound is set to docs_matched in the following lines in the
code:
} else if (docs_matched < check_at_least) {
// We have seen fewer matches than we checked for, so we must have seen
// all the matches.
DEBUGLINE(MATCH, "Setting bounds equal");
matches_lower_bound = matches_upper_bound = matches_estimated
= docs_matched;
2. *docs_matched *gets screwed when you specify a *percent_cutoff* because
the new_items pushed back in *items* are not verified against the cutoff
threshold before *docs_matched* is increased.
In other words, some documents are counted as "checked" but they shouldn't
be in the mset because of the cutoff.
3. There is a bug related to collapse key too. There are some cases in which
documents with the same collapse_key are counted in docs_matched.
I verified this by maintaining a set with added items' collapse_keys. The
modified code looks like this:
// OK, actually add the item to the mset.
if (pushback) {
++docs_matched;
* assert(keys_set.count(new_item.collapse_key) == 0);
keys_set.insert(new_item.collapse_key);*
I don't know how to reproduce the bug because it manifest on a big flint
index which is kind of hard to minimize & isolate.
Thanks,
On Thu, Sep 3, 2009 at 2:07 AM, Olly Betts <olly at survex.com> wrote:
> On Wed, Sep 02, 2009 at 04:48:55PM +0300, Silviu-Ionut Ganceanu wrote:
> > I was surprised to see that I get an estimated number of results of 47
> after
> > the first call to get_mset (with start=0, count=10) and 17 after the
> second
> > call to get_mset (start=10, count=10). What looks buggy here is that
> while I
> > specify that I want at least 501 documents to be checked, Xapian seems to
> > fail to count the exact number of documents (which is lower than 501).
>
> Yes, that sounds like a bug from what you've described.
>
> > I don't know if it's relevant but I'm using boolean filters (location) in
> > the query and python bindings.
>
> That shouldn't matter.
>
> There were some checkatleast fixes in the early 1.0.x releases, but if
> you can reproduce this with 1.0.15, it would be great to have some way
> to reproduce it.
>
> Cheers,
> Olly
>
--
Silviu-Ionuţ Gănceanu
More information about the Xapian-discuss
mailing list