[Xapian-discuss] Stopping wildcard expansion at some point

Adam Sjøgren asjo at koldfront.dk
Thu Mar 5 15:42:06 GMT 2009


  Hi.

Half a year ago, I ran into this annoying problem: Sometimes a user did
a search containing wildcards, and our server would run out of memory.

I found that the problem was that my index has _a lot_ of terms that
start with the same thing, and the code that expands the wildcards
(Term::as_wildcarded_query() in queryparser/queryparser.lemony) runs
more or less amok in this particular case.

But I would still like to have the wildcard functionality for the
prefixes that do not expand to a gazillion terms...

Back then I looked at the code and popped into the #xapian IRC channel
and talked a little to richardb about it. He suggested that I could try
adding a counter to as_wildcarded_query() and perhaps throwing an
exception if the wildcards were expanding "too much".

I quickly made a rough patch to throw an exception after 1000 terms -
and it made our systems and, therefore, me happy. I simply catch the
error and display a nice error-message to the user about the query being
to broad.

Now I have finally gotten the time to create a patch that is a little
bit better, as it allows you to configure the maximum expansion on the
QueryParser object by calling $qp->set_max_wildcard_expansion($max).

The patch is attached to this email. It was made against trunk as of now
(r12109), but I will happily make a patch against a branch, if need be.

Is this the right way to go? Should I rather try to do something else? 
Do I need to adjust something?

I know some C from way back, but never got around to learn C++, so I
hope you'll bear with any stupidities - any and all feedback is welcome.

I know I haven't done tests, all the documentation nor Changes-files,
but I thought I would fly this by the mailinglist sooner rather than
later (especially given how long it took me to get to this point...)


  Best regards,

    Adam

-- 
 "Soon we'll have spent a whole month at sea,                 Adam Sjøgren
  splitting atoms for no apparent reason"                asjo at koldfront.dk

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-set_max_wildcard_expansion-method-to-the-querypa.patch
Type: text/x-diff
Size: 0 bytes
Desc: not available
Url : http://lists.xapian.org/pipermail/xapian-discuss/attachments/20090305/b3fb9548/attachment.patch 


More information about the Xapian-discuss mailing list