[Xapian-tickets] [Xapian] #679: Memory and speed issues in wildcard searches

Xapian nobody at xapian.org
Wed May 6 13:00:19 BST 2015


#679: Memory and speed issues in wildcard searches
--------------------+-------------------------
 Reporter:  dk      |             Owner:  olly
     Type:  defect  |            Status:  new
 Priority:  normal  |         Milestone:
Component:  Other   |           Version:
 Severity:  normal  |        Resolution:
 Keywords:          |        Blocked By:
 Blocking:          |  Operating System:  All
--------------------+-------------------------
Description changed by dk:

Old description:

> Hello,
>
> I have a problem with some searches when wildcarding is involved, which
> is xapian eating lots of memory and performing slowly. The problem
> manifests itself when expansion of a query with a trailing * in it
> returns too many matches, and thus the expanded query contains lots of
> individual words.
>
> The simple example attached easily eats 1.6G on my machine, and never
> gives them back. This is a problem in long-running fastcgi processes,
> that grind the server down when users fire lots of these searches. I
> wonder if there can be done something about it.
>
> What's more interesting, that most of the time I don't even need all of
> the expanded words to match, only some 100 first documents. However
> setting a cap on the number of wildcard expansion doesn't help, xapian
> casts an exception "Wildcard expands to more than X terms". Surely
> there's a reason behind this (I just don't know what is it), but probably
> there could be added a flag to QueryParser that forces capping of the
> expansion? This will also helps the speed of the search.
>
> Please find attached code examples and the output of memory use.
>
> Sincerely,
> Dmitry Karasik
> IT System Developer
> Novozymes A/S
> Krogshoejvej 36
> 2880 Bagsvaerd Denmark

New description:

 Hello,

 I have a problem with some searches when wildcarding is involved, which is
 xapian eating lots of memory and performing slowly. The problem manifests
 itself when expansion of a query with a trailing * in it returns too many
 matches, and thus the expanded query contains lots of individual words.

 The simple example attached easily eats 1.6G on my machine, and never
 gives them back. This is a problem in long-running fastcgi processes, that
 grind the server down when users fire lots of these searches. I wonder if
 there can be done something about it.

 What's more interesting, that most of the time I don't even need all of
 the expanded words to match, only some 100 first documents. However
 setting a cap on the number of wildcard expansion doesn't help, xapian
 casts an exception "Wildcard expands to more than X terms". Surely there's
 a reason behind this (I just don't know what is it), but probably there
 could be added a flag to QueryParser that forces capping of the expansion?
 This will also helps the speed of the search.

 Please find attached code examples and the output of memory use.

 Sincerely,
 Dmitry Karasik
 IT System Developer
 Novozymes A/S
 Krogshoejvej 36
 2880 Bagsvaerd Denmark

 PS: tested on the bleeding edge shapshot

--

--
Ticket URL: <http://trac.xapian.org/ticket/679#comment:1>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list