[Xapian-discuss] Buggy checkatleast?

Silviu-Ionut Ganceanu silviug at gmail.com
Thu Oct 22 14:56:39 BST 2009


Hi Olly,

I was crazy enough to attempt to debug this by myself. I worked on version
1.0.16.

I think the problem is in the get_mset implementation from
xapian-core/matcher/multimatch.cc. What I understood by analyzing
xapian-logs and a couple of printfs inserted here and there:

1. matches_*_bound is set to docs_matched in the following lines in the
code:

    } else if (docs_matched < check_at_least) {
    // We have seen fewer matches than we checked for, so we must have seen
    // all the matches.
    DEBUGLINE(MATCH, "Setting bounds equal");
    matches_lower_bound = matches_upper_bound = matches_estimated
        = docs_matched;

2. *docs_matched *gets screwed when you specify a *percent_cutoff* because
the new_items pushed back in *items* are not verified against the cutoff
threshold before *docs_matched* is increased.

In other words, some documents are counted as "checked" but they shouldn't
be in the mset because of the cutoff.

3. There is a bug related to collapse key too. There are some cases in which
documents with the same collapse_key are counted in docs_matched.

I verified this by maintaining a set with added items' collapse_keys. The
modified code looks like this:

    // OK, actually add the item to the mset.
    if (pushback) {
        ++docs_matched;
*        assert(keys_set.count(new_item.collapse_key) == 0);
        keys_set.insert(new_item.collapse_key);*


I don't know how to reproduce the bug because it manifest on a big flint
index which is kind of hard to minimize & isolate.

Thanks,

On Thu, Sep 3, 2009 at 2:07 AM, Olly Betts <olly at survex.com> wrote:

> On Wed, Sep 02, 2009 at 04:48:55PM +0300, Silviu-Ionut Ganceanu wrote:
> > I was surprised to see that I get an estimated number of results of 47
> after
> > the first call to get_mset (with start=0, count=10) and 17 after the
> second
> > call to get_mset (start=10, count=10). What looks buggy here is that
> while I
> > specify that I want at least 501 documents to be checked, Xapian seems to
> > fail to count the exact number of documents (which is lower than 501).
>
> Yes, that sounds like a bug from what you've described.
>
> > I don't know if it's relevant but I'm using boolean filters (location) in
> > the query and python bindings.
>
> That shouldn't matter.
>
> There were some checkatleast fixes in the early 1.0.x releases, but if
> you can reproduce this with 1.0.15, it would be great to have some way
> to reproduce it.
>
> Cheers,
>     Olly
>


-- 
Silviu-Ionuţ Gănceanu


More information about the Xapian-discuss mailing list