[Xapian-discuss] Xapian performance on gmane.org compared

Henry henka at cityweb.co.za
Thu Aug 27 15:06:06 BST 2009


Greetings,

Using xapian revision 13300 (chert db).
Test chert database is about 4GB - 320,000 docs.

Performance for typical one or more keyword searches is quick.  For  
example, search for [upload site page] yields the query:
Xapian::Query((upload:(pos=1) OR site:(pos=2) OR page:(pos=3)))
Takes a second.

However, searching for something like [co.uk] is mind-numbingly and  
_alarmingly_ slow.
Xapian::Query((co:(pos=1) PHRASE 2 uk:(pos=2)))
Looks like it interprets this search as a phrase.
Takes over _40_ seconds.

Typical phrase searches, such as ["your email"] take a few seconds  
longer than normal keyword searches (as expected), but nowhere near as  
slow as 40+s.

I'm trying to get a handle on how best to improve the situation, so  
having something to compare against would be informative.  I notice  
that gmane.org has about 70 million articles, yet the same search  
[co.uk] returns in 4s.  Yes, these are plain text and relatively small  
docs, but still...

I must be doing something wrong.

If I may:
What DB format is gmane.org using (chert/flint)?
What's the DB size on disk?
How many search servers is gmane.org using?  Their approx. spec?

Any comments would be appreciated.

Thanks
Henry






More information about the Xapian-discuss mailing list