[Xapian-tickets] [Xapian] #679: Memory and speed issues in wildcard searches
Xapian
nobody at xapian.org
Wed May 6 13:00:19 BST 2015
#679: Memory and speed issues in wildcard searches
--------------------+-------------------------
Reporter: dk | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
--------------------+-------------------------
Description changed by dk:
Old description:
> Hello,
>
> I have a problem with some searches when wildcarding is involved, which
> is xapian eating lots of memory and performing slowly. The problem
> manifests itself when expansion of a query with a trailing * in it
> returns too many matches, and thus the expanded query contains lots of
> individual words.
>
> The simple example attached easily eats 1.6G on my machine, and never
> gives them back. This is a problem in long-running fastcgi processes,
> that grind the server down when users fire lots of these searches. I
> wonder if there can be done something about it.
>
> What's more interesting, that most of the time I don't even need all of
> the expanded words to match, only some 100 first documents. However
> setting a cap on the number of wildcard expansion doesn't help, xapian
> casts an exception "Wildcard expands to more than X terms". Surely
> there's a reason behind this (I just don't know what is it), but probably
> there could be added a flag to QueryParser that forces capping of the
> expansion? This will also helps the speed of the search.
>
> Please find attached code examples and the output of memory use.
>
> Sincerely,
> Dmitry Karasik
> IT System Developer
> Novozymes A/S
> Krogshoejvej 36
> 2880 Bagsvaerd Denmark
New description:
Hello,
I have a problem with some searches when wildcarding is involved, which is
xapian eating lots of memory and performing slowly. The problem manifests
itself when expansion of a query with a trailing * in it returns too many
matches, and thus the expanded query contains lots of individual words.
The simple example attached easily eats 1.6G on my machine, and never
gives them back. This is a problem in long-running fastcgi processes, that
grind the server down when users fire lots of these searches. I wonder if
there can be done something about it.
What's more interesting, that most of the time I don't even need all of
the expanded words to match, only some 100 first documents. However
setting a cap on the number of wildcard expansion doesn't help, xapian
casts an exception "Wildcard expands to more than X terms". Surely there's
a reason behind this (I just don't know what is it), but probably there
could be added a flag to QueryParser that forces capping of the expansion?
This will also helps the speed of the search.
Please find attached code examples and the output of memory use.
Sincerely,
Dmitry Karasik
IT System Developer
Novozymes A/S
Krogshoejvej 36
2880 Bagsvaerd Denmark
PS: tested on the bleeding edge shapshot
--
--
Ticket URL: <http://trac.xapian.org/ticket/679#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list