[Xapian-discuss] b+tree to cdb

Sat Sep 25 08:08:06 BST 2004

i've been performing some tests with xapian on a set of docs numbering 
86801 with average length of 291.895. i've also removed stop words to 
help with speed and size but i'm noticing some serious performance hits 
when it comes to phrase searches, sometimes hitting +14s as compared to 
~0.01s i'm getting with most boolean searches.

the following quote from the documentation has peaked my interest: 
"implementing an ultra-compact read-only backend which would take a 
quartz b-tree and convert it to something like a cdb hashed file" ... 
"If you need serious performance, implementing such a backend is worth 
considering.". firstly, would this help with phrase searches and 
secondly, what are the details in implementing such a beast? i don't 
need write access once indexed so this may be something i'd like to try.

one other question if i may. how would one get just the record numbers 
for the entire estimated MSet? my html layout at the moment is two 
frames, a table of contents on the left and search/document on the 
right. in order to show (in the toc) which folder/subfolders/documents 
have hits, i'm having to issue 'enquire.get_mset(0, 1000)' to shelve the 
first 1000 record numbers then issue another 'enquire.get_mset(0, 10)' 
to display the first ten, twenty, etc... of course, the bottleneck is 
the first get_mset so removing that step would help wonderfully.

hardware-wise, i'm running a Pentium III (Katmai) @ 551MHz with 512MB 
RAM and 7200 (hdparm reports 39.45 MB/sec).

thanks,
andrew