[Xapian-discuss] search queries with less than 3 characters, memory goes nuts
Olly Betts
olly at survex.com
Sat Aug 15 13:36:12 BST 2009
On Sat, Aug 15, 2009 at 12:58:53PM +0200, chris wrote:
> As soon as mongrel hands over the query to xapian, memory usage of the
> webserver-process goes up 'till the box runs out of ram and if i
> give the box 50GB swap, it'll eat them too.
>
> I could narrow the problem down to queries that contain parts, which are
> less than 3 characters.
There's nothing special regarding term length, though shorter terms
tend to match more documents.
> So my questions are:
> - why does xapian use countless gigabytes of ram if i feed it such
> a query?
I've never seen it do so before.
> - is there a need to clean the query before? i mean, could someone do
> something nasty with it? (except the usual html-security things,
> which we take care of by escaping the query before display)
There shouldn't be a need.
> - what can i do to prevent this?
My guess is that acts_as_xapian is asking Xapian to return all possible
matches, is getting a few million, and is storing them in a
space-inefficient way.
The code here seems to show @limit defaults to "-1" which I assume
means "maximum unsigned integer" by the time Xapian sees it:
http://github.com/Overbryd/acts_as_xapian/blob/dc3517c66b18dbf66733aac3ba436c7bf4ffcab8/lib/acts_as_xapian.rb
It would be useful to narrow down which layer is causing this. Can you
try running some of these "bad" queries without the Ruby layers involved
(examples/quest in xapian-core provides an easy way to run a query
against a database).
If that works OK, try it from just using the Ruby bindings (without
acts_as_xapian) - you may find examples/simplesearch.rb useful for that.
If the problem is in acts_as_xapian, you'll need to talk to its
developers, or just pass a sane limit giving the number of matches you
actually want. It's a good idea to do that anyway since asking for all
possible matches will disable various matcher optimisations and slow
down searches.
Cheers,
Olly
More information about the Xapian-discuss
mailing list