[Xapian-devel] Bug and patch for +terms with wildcards

Olly Betts olly at survex.com
Wed Dec 6 06:13:26 GMT 2006


On Wed, Dec 06, 2006 at 12:04:24AM +0000, richard at lemurconsulting.com wrote:
> In current Xapian SVN HEAD, there is a bug in the query parser concerned
> with the handling of wildcard terms with a "+" prefix.  Specifically,
> a query such as "+foo* bar" will be parsed by the query parser into
> Xapian::Query("bar") if there are no terms in the database which start
> "foo".  Instead, since the "+" term cannot be matched, I believe this query
> should return no documents.

Seems reasonable.

> I've put a patch together to fix this issue, but it requires the
> introduction of a new query operator, to mark a query as matching no
> documents (as opposed to a query created with the default query
> constructor, which represents an undefined query).

You can actually get away without this by using "X AND_NOT X" where
X is ideally a term which probably doesn't index anything.  Just an
interesting observation really, I'm not suggesting we should use this
hack (though Omega used to long long ago).

> I've called this operator "OP_MATCH_NOTHING", and it takes no
> subqueries.

I don't think this should be an operator.  It doesn't operate *on*
anything, so I think this is an unnatural place to put this.  I know we
have OP_LEAF internally, but that's an implementation detail and we
deliberately don't expose it externally, instead we have a constructor
to create a leaf Query object.

> I believe this should be public, since it may be useful for people trying
> to write their own query parsers, rather than relying on the builtin query
> parser.

Agreed.

> It's possible that a similar approach would be a neat solution for
> representing "alldocument" queries.  Currently, a special query term can be
> created which matches all documents by creating a leaf query for which the
> term is the null string.  This is a somewhat "magic" and unobvious approach
> - instead, we could introduce an "OP_MATCH_ALL" nullary operator, which
> would be converted to a postlist which matches all documents.  It's not
> clear why an empty term should magically match all documents, rather than
> none, or indeed why it should have any special meaning.

I agree that Xapian::Query("") isn't totally obvious, but I don't think
a "nullary" (ick, what an awful word) operator is any better.

We already have a "query which matches nothing" - Xapian::Query().  You
can run a match on it (and get an empty MSet), but you can't currently
compose it with anything else.

I think a better approach would be to allow Xapian::Query() to be
composed with other queries.  I know we used to allow this and it all
got out of hand, but that was because we tried to make it compose in
magic ways (e.g. Query() AND X would be X).  I'm just proposing that
this works in the obvious was for an empty query, and you've already
implemented this anyway!

So in omquery.cc, instead of throwing an exception "Can't compose a
query from undefined queries" in various places, we would just call:

    internal->add_subquery(Query::Internal(OP_MATCH_NOTHING));

And then add two standard constant objects to the API:

Xapian::Query::MatchAll
Xapian::Query::MatchNothing

To create these in include/xapian/query.h we'd have:

    namespace Xapian {
	// ...
	class Query {
	    // ...
	    static Query MatchAll("");
	    static Query MatchNothing();
	    //...

Conceptually these sit much better as constants than operators.

> +	        case '!': {
> +		    return new Xapian::Query::Internal(Xapian::Query::OP_MATCH_NOTHING, 0);
> +	        }

I think the remote protocol version probably should be incremented
because of this change.

I know this is compatible unless you use the new feature, but it's
better to moan up front than fail on particular queries.  Most people
will want to update both ends in step anyway I suspect.

Cheers,
    Olly



More information about the Xapian-devel mailing list