[Xapian-discuss] Xapian performance on gmane.org compared
Arjen van der Meijden
acmmailing at tweakers.net
Thu Aug 27 15:48:44 BST 2009
On 27-8-2009 16:06 Henry wrote:
> Using xapian revision 13300 (chert db).
> Test chert database is about 4GB - 320,000 docs.
> Performance for typical one or more keyword searches is quick. For
> example, search for [upload site page] yields the query:
> Xapian::Query((upload:(pos=1) OR site:(pos=2) OR page:(pos=3)))
> Takes a second.
> However, searching for something like [co.uk] is mind-numbingly and
> _alarmingly_ slow.
> Xapian::Query((co:(pos=1) PHRASE 2 uk:(pos=2)))
> Looks like it interprets this search as a phrase.
> Takes over _40_ seconds.
You could have a look at the size of the result for non-phrased co and
uk (i.e. "co AND uk"). We've seen pretty bad performance for some phrase
queries in the flint-database, but then our machine used to be
io-dependent. This should give you an idea of how many documents are
loaded from disk for the initial selection and how fast that goes.
But since the phrase-query touches another large table, you can't use it
as more than a simple base line.
> I'm trying to get a handle on how best to improve the situation, so
> having something to compare against would be informative. I notice
> that gmane.org has about 70 million articles, yet the same search
> [co.uk] returns in 4s. Yes, these are plain text and relatively small
> docs, but still...
4GB is a "very small" database, i.e. it can fit in a amount of ram that
is now becoming common for desktops. How much memory does your
search-machine have? If it doesn't have at least 4GB, and you can spare
a bit of money, increase it.
If there are no other factors in play, and your query-performance is
solely or largely caused by lacking I/O-performance, you could also
install a ssd-drive. With our benchmark, we had all phrase-queries turn
from io-limited into cpu-limited, simply because both the ram and ssd's
in our server just were easily fast enough to keep up.
More information about the Xapian-discuss