[Xapian-discuss] b+tree to cdb
gervin23
gervin23 at fastmail.fm
Sat Sep 25 08:08:06 BST 2004
i've been performing some tests with xapian on a set of docs numbering
86801 with average length of 291.895. i've also removed stop words to
help with speed and size but i'm noticing some serious performance hits
when it comes to phrase searches, sometimes hitting +14s as compared to
~0.01s i'm getting with most boolean searches.
the following quote from the documentation has peaked my interest:
"implementing an ultra-compact read-only backend which would take a
quartz b-tree and convert it to something like a cdb hashed file" ...
"If you need serious performance, implementing such a backend is worth
considering.". firstly, would this help with phrase searches and
secondly, what are the details in implementing such a beast? i don't
need write access once indexed so this may be something i'd like to try.
one other question if i may. how would one get just the record numbers
for the entire estimated MSet? my html layout at the moment is two
frames, a table of contents on the left and search/document on the
right. in order to show (in the toc) which folder/subfolders/documents
have hits, i'm having to issue 'enquire.get_mset(0, 1000)' to shelve the
first 1000 record numbers then issue another 'enquire.get_mset(0, 10)'
to display the first ten, twenty, etc... of course, the bottleneck is
the first get_mset so removing that step would help wonderfully.
hardware-wise, i'm running a Pentium III (Katmai) @ 551MHz with 512MB
RAM and 7200 (hdparm reports 39.45 MB/sec).
thanks,
andrew
More information about the Xapian-discuss
mailing list