[Xapian-discuss] xapian performance

Olly Betts olly at survex.com
Tue Dec 5 01:35:29 GMT 2006


On Fri, Dec 01, 2006 at 11:45:01AM +0100, Arjen van der Meijden wrote:
> I don't understand why some of these queries all of a sudden are much 
> faster, perhaps some of the datablocks were in memory now due to 
> different query orders. But appareantly the query  "als platte tekst en 
> html verzenden" didn't return any result (doesn't in the production 
> environment either), but it still took 15 seconds to figure that out.

That probably means that there are plenty of documents with all those
terms in, but none with the phrase itself.  We have to check all the
documents which match an AND query for those terms to find this.

> But the query @live.nl should return results and it doesn't in quest, so 
> there is a difference in query parsing between quest and omega?

Quest currently forces on English stemming.  I guess you're stemming in
Dutch, or not at all.  Try this patch:

http://www.oligarchy.co.uk/xapian/patches/quest-stemmer-option.patch

> >Just the messages (i.e. --enable-debug-verbose) shouldn't be too bad
> >especially if you log to a file, though it's certainly slower.  You
> >can specify which messages types using a bitmap specified in env var
> >XAPIAN_DEBUG_FLAGS as the "HACKING" document describes.  The categories
> >each cover quite a lot of messages though.
> 
> Getting all the messages was taking a very long time actually, after a 
> few minutes and a log file of over 40MB (or so) I gave up. What I 
> couldn't understand from the HACKING document is how this bitmap looks 
> like and which bit is for which kind of message... The all-output gives 
> a number at the start of the line, does that correspond with the bit?

Yes.

> I.e. should I provide XAPIAN_DEBUG_FLAGS=00001 to enable the fifth bit 
> and does that correspond with lines starting with the number 5?

It's just a decimal value with the 5th bit set - i.e. 32 (bits are
counted from 0).  As a cheat, -1 sets all bits (because it's all
ones in twos complement binary).  I've clarified this in the
documentation now.

> Query: "ie 7"
> 152097 blocks read from /home/acm/xapian-db/db/default/position.
> 13,874s taken

> Query: x-mod
> 49878 blocks read from /home/acm/xapian-db/db/default/position.
> 114,126s taken

It's interesting that the timings don't really reflect the number
of blocks read much.  Perhaps that's due to blocks already cached
in some cases.

Cheers,
    Olly



More information about the Xapian-discuss mailing list