[Xapian-discuss] Excessive memory use when using FLAG_PARTIAL?

Olly Betts olly at survex.com
Tue Jan 11 12:41:24 GMT 2011


On Tue, Jan 04, 2011 at 11:51:16AM -0800, Sean McCleary wrote:
> I'm using Xapian (tried both versions 1.0.17 and 1.2.4) with the PHP
> bindings on Ubuntu 10.04 (Lucid) and Apache 2.2.14.  I'm using it for an
> "auto-complete" in the search form on a web page.  But whenever I use
> FLAG_PARTIAL on my search, the memory usage of the apache process quickly
> balloons up to almost 100% of the available memory resources, and hangs
> there in "Sending reply" status.
> 
> The execution of the PHP script finishes, but the apache process is stuck,
> and consuming almost all the available memory.
> 
> I've found that when I remove the "FLAG_PARTIAL" flag from my query, this
> problem does not happen.
> 
> Is this expected behavior?  The server this is running on has 512 MB of
> memory.  My Xapian index is only 108 MB in size.

FLAG_PARTIAL currently just expands the partial word at the end of the
query to all the possible completions, so if the partial word is short
this can generate a query with a lot of terms (particularly when the
partial word is just a single common character, such as 's' in English).

Each term in the query needs a certain amount of memory, regardless of
the size of the database on disk - judging by the figures in another
recent post to the list, this is something like 55KB currently, so if
the partial word expands to 10000 or more terms, the process size will
grow to more than the size of your physical memory.  My guess would be
that this is the cause of your problem.

The memory overhead per term could probably be reduced, but actually
it's probably not useful to expand such short partial terms - a search
for all words starting with the same letter is just going to be too
noisy to be useful, regardless of the resources it would need.  So
my thought would be to add a minimum length for the partial words
which will be expanded under FLAG_PARTIAL, and probably a way to
specify this via the API.

Cheers,
    Olly



More information about the Xapian-discuss mailing list