[Xapian-discuss] Search performance issues and profiling/debugging search?

Ron Kass ron at pidgintech.com
Tue Oct 23 20:09:25 BST 2007


Hi All..

We are running a small search-test script to test speed of searches.
We iterate through a list of 2500 searches, executing each. After a 
while we started noticing that the speed or recurring searches is not 
always faster than the original.
We started running each search, three times in a row.
We were surprised to see that many times, sequentially identical 
searches take as much time as the original or even more, which doesn't 
make much sense considering things should be cached in such cases.

Some example:
16/7500: Searching for: no recip...************************* : [Time: 
1.053 sec] [Results: 25 of 18924]
17/7500: Searching for: no recip...************************* : [Time: 
1.053 sec] [Results: 25 of 18862]
18/7500: Searching for: no recip...************************* : [Time: 
1.655 sec] [Results: 25 of 17197]  
...
52/7500: Searching for: coupons...************************* : [Time: 
0.106 sec] [Results: 25 of 78205]
53/7500: Searching for: coupons...************************* : [Time: 
0.106 sec] [Results: 25 of 79775]
54/7500: Searching for: coupons...************************* : [Time: 
0.688 sec] [Results: 25 of 72030]          
...
172/7500: Searching for: microsoft...************************* : [Time: 
0.368 sec] [Results: 25 of 245689]
173/7500: Searching for: microsoft...************************* : [Time: 
0.29 sec] [Results: 25 of 235657]
174/7500: Searching for: microsoft...************************* : [Time: 
1.453 sec] [Results: 25 of 233288]        
...
478/7500: Searching for: map of the world...************************* : 
[Time: 15.703 sec] [Results: 25 of 1355]
479/7500: Searching for: map of the world...************************* : 
[Time: 10.004 sec] [Results: 25 of 1230]
480/7500: Searching for: map of the world...************************* : 
[Time: 8.902 sec] [Results: 25 of 1117]    

Especially with the first and last examples, we notice that search for - 
no recip - takes more than one second, regardless of how many times we 
execute it, and - map of the world - takes over 8 seconds constantly.
Some notes about the environment of the search test:
* we open the database only once, so save time.
* hardware, quad xeon, 16GB mem
* data across 10 databases over 5xSATA2 drives
* Database contains 150M docs, each about 3K
* The server is not doing anything beside the search testing.
Also:
* the Astrix signs in the debug info represent fetching document data, 
score, matching words and other details from the mset items.
* Estimates vary, although its exactly the same search done right one 
after the other with no changes to the DB (no data added). This is not 
really a big issue.

My questions/concerns:
1. Why would a sequential search take as much time? Shouldn't all data 
be cached as the query is "warmed up" already? Why would a third search 
for "microsoft" take so long, not to mention "map of the world"?
2. Even in the fastest queries, we very rarely see repeated search 
taking less than 0.1 seconds, even those with the simplest of terms. 
Does that make sense?
3. For such a hardware we use for testing, 16GB ram, quad xeon, 5xSATA2 
[500GB each] disks.. How much would you expect it to "carry" assuming a 
typical document is 3K and we store positional, stemmed information with 
12-16 values per item and a short integer number in the document data?
4. Anyone has ideas how we can profile things to see why we have such 
performance issues? We would be happy to dig deeper to find out where 
the problems arise from.

Looking forward to hearing your feedback..
Ron



More information about the Xapian-discuss mailing list