[Xapian-discuss] Phrase search performance
Arjen van der Meijden
acmmailing at tweakers.net
Tue Feb 21 06:49:19 GMT 2006
On our box (dual Xeon 3.8Ghz, 8GB memory, 5+1 WD Raptor's in raid5) we
don't see too many > 1 min queries, but they do happen. Our database is
about 12GB in compacted flint with 1M records.
I've split up omega's logfile in either normal queries and slow queries,
where the latter took more than 2 seconds. Currently there are 422850
normal queries (averaging to 0.06 seconds/query, 99% done within 0.5
second) and 6824 slow queries (about 1.6% of the total). The average of
those slow queries is about 17 seconds, where about 95% is done within
one minute. But there are a few queries taking up to 10 minutes, for
instance this one: "constante download 1.5 en upload 0.7".
Anyway, it really depends on your configuration, but "main channel"
should probably be feasible within a few seconds. Our database of
currently 1081588 documents was searched just below 5 seconds. Main and
channel are resp. 16378 and 15321 times in our database. A refresh of
the search page was done in 0.053 second... so a bit of ram does help ;)
Anyway, phrase searches are mainly I/O-dependent (searching for I/O is a
nice one too btw) though. We upgraded from dual Xeon 2.8Ghz, 4GB ram
with two 15k rpm scsi disks in raid 0 to the above configuration and saw
major performance improvements, the duplicated (and faster) ram, the
faster system bus and the faster disk-setup each played their part of
course. Anyway, the easiest way to improve your set up is adding RAM.
I'm not sure how fast your SAN is compared to some of the faster local
disks, but I imagine a single sata WD Raptor locally may be able to beat
it in terms of throughput and response times, let alone a few in raid.
If you can get 4GB of memory in your box, you'll likely see some more
improvements. Keep in mind that the initial query time will still be low
if the SAN isn't too fast.
Arjen van der Meijden
On 20-2-2006 22:53, Alex Deucher wrote:
> On 2/20/06, Olly Betts <olly at survex.com> wrote:
>> On Mon, Feb 20, 2006 at 04:14:50PM -0500, Alex Deucher wrote:
>>> my query was: Xapian::Query((FTEXT:main PHRASE 2 FTEXT:channel))
>>> FTEXT:main term frequency: 37983
>>> FTEXT:channel term frequency: 16106
>> OK, so pretty common terms if the database is ~100000 documents. This
>> is likely to be a slow case, but I'd hope for a few seconds at worst,
>> not a few minutes.
>> I have some speed ups for phrase searching planned, but from your other
>> message I think the main issue here is that you need more RAM.
> I tried it on a box with 2 GB of RAM and it's down to about 1 minute.
> so RAM definitely helps.
>>>> For Perl, see the "new_term" method of "Search::Xapian::Query" - added
>>>> in 0.9.2.3.
>>> Is there any documentation on that Perl code anywhere? the stuff is
>>> CPAN is pretty limited.
>> It's improved a lot in the last few releases (and will improve in the
>> next too). Most classes and methods now have POD documentation.
> Thanks Olly!
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
More information about the Xapian-discuss