[Xapian-discuss] Stopping wildcard expansion at some point

Olly Betts olly at survex.com
Fri Mar 6 01:24:17 GMT 2009


On Thu, Mar 05, 2009 at 04:42:06PM +0100, Adam Sjøgren wrote:
> Now I have finally gotten the time to create a patch that is a little
> bit better, as it allows you to configure the maximum expansion on the
> QueryParser object by calling $qp->set_max_wildcard_expansion($max).

> Is this the right way to go? Should I rather try to do something else? 

One potential problem - it might be good to push the wildcard expansion
into the backend, as it may be able to optimise it better that way (it
should at least be able to avoid generating a Query object per expanded
term):
    
http://trac.xapian.org/ticket/48

But adding this feature to QueryParser means that we have to expand the
wildcard there, or at least calculate how many terms it would expand to
which requires iterating all the terms at that point if a limit is set.
So setting a limit would incur a cost even if it never kicked in.

Perhaps this is fixing the symptom rather than the problem - if these are
actually useful searches we're rejecting, rather than for example
accidental invocation of the wildcard facility, or malicious users, it
would be better to make these cases work more efficiently than impose a
somewhat arbitrary limit.  For example, by storing prefix terms:

http://trac.xapian.org/ticket/207

Or perhaps we should allow a lower limit on the number of characters
before the wildcard rather than a limit on the number of expansions
(so if this limit were 3, a* and ab* wouldn't be allowed, but abc*
would).

> Do I need to adjust something?

It should use Xapian::termcount rather than long.

The error class should be QueryParserError - InvalidOperationError
"indicates the API was used in an invalid way", which isn't the case
here.

The error message is likely to be shown to the user, so should really
mention the wildcard expansion which was the problem, in case there's
more than one in the query, or the user doesn't know about the wildcard
syntax and accidentally invoked the feature.  (Hmm, or would the limit
be better per parsed query than per wildcard expansion?)

> I know I haven't done tests, all the documentation nor Changes-files,
> but I thought I would fly this by the mailinglist sooner rather than
> later (especially given how long it took me to get to this point...)

Sure.

Cheers,
    Olly



More information about the Xapian-discuss mailing list