[Xapian-discuss] Stopping wildcard expansion at some point
Adam Sjøgren
asjo at koldfront.dk
Thu Mar 5 15:42:06 GMT 2009
Hi.
Half a year ago, I ran into this annoying problem: Sometimes a user did
a search containing wildcards, and our server would run out of memory.
I found that the problem was that my index has _a lot_ of terms that
start with the same thing, and the code that expands the wildcards
(Term::as_wildcarded_query() in queryparser/queryparser.lemony) runs
more or less amok in this particular case.
But I would still like to have the wildcard functionality for the
prefixes that do not expand to a gazillion terms...
Back then I looked at the code and popped into the #xapian IRC channel
and talked a little to richardb about it. He suggested that I could try
adding a counter to as_wildcarded_query() and perhaps throwing an
exception if the wildcards were expanding "too much".
I quickly made a rough patch to throw an exception after 1000 terms -
and it made our systems and, therefore, me happy. I simply catch the
error and display a nice error-message to the user about the query being
to broad.
Now I have finally gotten the time to create a patch that is a little
bit better, as it allows you to configure the maximum expansion on the
QueryParser object by calling $qp->set_max_wildcard_expansion($max).
The patch is attached to this email. It was made against trunk as of now
(r12109), but I will happily make a patch against a branch, if need be.
Is this the right way to go? Should I rather try to do something else?
Do I need to adjust something?
I know some C from way back, but never got around to learn C++, so I
hope you'll bear with any stupidities - any and all feedback is welcome.
I know I haven't done tests, all the documentation nor Changes-files,
but I thought I would fly this by the mailinglist sooner rather than
later (especially given how long it took me to get to this point...)
Best regards,
Adam
--
"Soon we'll have spent a whole month at sea, Adam Sjøgren
splitting atoms for no apparent reason" asjo at koldfront.dk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-set_max_wildcard_expansion-method-to-the-querypa.patch
Type: text/x-diff
Size: 0 bytes
Desc: not available
Url : http://lists.xapian.org/pipermail/xapian-discuss/attachments/20090305/b3fb9548/attachment.patch
More information about the Xapian-discuss
mailing list