[Xapian-discuss] Phrase search performance

Arjen van der Meijden acmmailing at tweakers.net
Tue Feb 21 06:49:19 GMT 2006


On our box (dual Xeon 3.8Ghz, 8GB memory, 5+1 WD Raptor's in raid5) we 
don't see too many > 1 min queries, but they do happen. Our database is 
about 12GB in compacted flint with 1M records.

I've split up omega's logfile in either normal queries and slow queries, 
where the latter took more than 2 seconds. Currently there are 422850 
normal queries (averaging to 0.06 seconds/query, 99% done within 0.5 
second) and 6824 slow queries (about 1.6% of the total). The average of 
those slow queries is about 17 seconds, where about 95% is done within 
one minute. But there are a few queries taking up to 10 minutes, for 
instance this one: "constante download 1.5 en upload 0.7".

Anyway, it really depends on your configuration, but "main channel" 
should probably be feasible within a few seconds. Our database of 
currently 1081588 documents was searched just below 5 seconds. Main and 
channel are resp. 16378 and 15321 times in our database. A refresh of 
the search page was done in 0.053 second... so a bit of ram does help ;)

Anyway, phrase searches are mainly I/O-dependent (searching for I/O is a 
nice one too btw) though. We upgraded from dual Xeon 2.8Ghz, 4GB ram 
with two 15k rpm scsi disks in raid 0 to the above configuration and saw 
major performance improvements, the duplicated (and faster) ram, the 
faster system bus and the faster disk-setup each played their part of 
course. Anyway, the easiest way to improve your set up is adding RAM. 
I'm not sure how fast your SAN is compared to some of the faster local 
disks, but I imagine a single sata WD Raptor locally may be able to beat 
it in terms of throughput and response times, let alone a few in raid.

If you can get 4GB of memory in your box, you'll likely see some more 
improvements. Keep in mind that the initial query time will still be low 
if the SAN isn't too fast.

Best regards,

Arjen van der Meijden

On 20-2-2006 22:53, Alex Deucher wrote:
> On 2/20/06, Olly Betts <olly at survex.com> wrote:
>> On Mon, Feb 20, 2006 at 04:14:50PM -0500, Alex Deucher wrote:
>>> my query was: Xapian::Query((FTEXT:main PHRASE 2 FTEXT:channel))
>>> FTEXT:main term frequency: 37983
>>> FTEXT:channel term frequency: 16106
>> OK, so pretty common terms if the database is ~100000 documents.  This
>> is likely to be a slow case, but I'd hope for a few seconds at worst,
>> not a few minutes.
>>
>> I have some speed ups for phrase searching planned, but from your other
>> message I think the main issue here is that you need more RAM.
> 
> I tried it on a box with 2 GB of RAM and it's down to about 1 minute.
> so RAM definitely helps.
> 
>>>> For Perl, see the "new_term" method of "Search::Xapian::Query" - added
>>>> in 0.9.2.3.
>>> Is there any documentation on that Perl code anywhere?  the stuff is
>>> CPAN is pretty limited.
>> It's improved a lot in the last few releases (and will improve in the
>> next too).  Most classes and methods now have POD documentation.
>>
> 
> Excellent!
> 
> Thanks Olly!
> 
>> Cheers,
>>     Olly
>>
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
> 



More information about the Xapian-discuss mailing list