Is there a large variance in xapian searching?

Mon Jul 2 11:08:40 BST 2018

Dear XAPIAN developers,

I was using xapian to index large than 13 million document about Q &
A(Quora similarly). I will share some performance data about indexing
and searching, and I will seek some help for improving performance of
searching.

My computer has 8 i7 at 3.4G CPU and 16G memory, ubuntu 16.04. Dataset
include about 13M document, each document will be cut into 35
term(Chinese word) on average.

I adopted split-merge algorithm as well. I built each index which
contained 500K data and then merged them into one databases. Building
smaller databases cost 2 min 40 s on avg. Compacting them cost about 2
hr 12 min.

I found every first time(like after booting computer) or
sometime(occasional) to query(use QueryParse) this databases will cost
significant seconds (like 5 seconds), although it cost 0.8 seconds on
average. What is the reason of this? Or how can I debug this, I mean
where can I add some LOGLINE to measure these time?

If I want to shorten this query time what should I do or try? BTW, I
think splitting more databases and query them parallelly is not a good
idea, unless xapian ensure each query is less than a expected
time(Actually this 13M database is 'small', :P).

-- 
One of my most productive days was throwing away 1000 lines of code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20180702/d54e3add/attachment.html>