[Xapian-discuss] CPU high usage

Andrey alpha04 at netvigator.com
Sun Jun 21 10:54:16 BST 2009


Hi

I am using Xapian 1.012 here, trying to optimise the search preformance.

My testing suit has 10M docs of forum threads,

DB only indexed the thread title, author name, category name, and 1 optional 
serialize value(0) which is the unix dateline
DB_full indexed all the DB terms + thread contents

After couple tests, I decided to remove ALL anchor terms such as 
SHOW_PUBLIC, MORE_IMPORTANT
Before i used AND_MAYBE (MORE_IMPORTANT) in query to add weights to more 
important docs
Before i used AND (SHOW_PUBLIC) to search for public thread
I removed these switches coz sometimes the CPU useage pops to 20% for one 
query (espically when the result set is big)
And i also decided to seperate the DB into 2 sets, 1 with contents and 1 
without contents

Now, I had removed all switches in Doc....
I also manually sort all documents in a Mysql Inno from lower -> higher 
important, older->newer date.
After I had sort the deck in the proper order, i begin to put it in xapian 
one by one, docid=1,docid=2,docid=3....
I put it this way is because I dont want to use any sorting by value in 
xapian, just the plain sort by docid DESC during my Bool weight query

Ok, my question is, after this setup, most(90%) of my queries are 0.3-0.7 
CPU per request now(using PHP binding)..
But once a while, for some term, I am still having a 6% CPU in a very simple 
query (using PHP binding)...
e.g.Xapian::Query(movie:(pos=1,wqf=12))
in a 10M docs db only indexed little terms (8.6G size)
Matches Estimated 421,057 Time: 0.1850
This one uses 6.3% CPU

I wonder, what is the cause of this usage of the CPU? is it the ranker?
I already did all I can to minimize costs, what else can I do to prevent / 
load balance the situtation?
Will i better off in using other binding? e.g. python?
Will i better off in using distributed search?
My goal is to optimize the search, while the doc size will grow to very 
big,e.g. 100M+


My testing suit is using:
Quad CPU    Q6600  @ 2.40GHz
8G ram
1x 10krpm WD HD

My live servers:
Dell R710
2x E5530 2.4G
24G RAM 1333MHz
8x 73G 15K RPM SAS raid 0

Cheers
Andrey 





More information about the Xapian-discuss mailing list