[Xapian-tickets] [Xapian] #326: Searches with chert are slow, due to slow doclen access

Xapian nobody at xapian.org
Thu Feb 5 09:49:37 GMT 2009

#326: Searches with chert are slow, due to slow doclen access
 Reporter:  richard        |       Owner:  richard  
     Type:  defect         |      Status:  new      
 Priority:  normal         |   Milestone:  1.1.0    
Component:  Backend-Chert  |     Version:  SVN trunk
 Severity:  normal         |   Blockedby:           
 Platform:  All            |    Blocking:           
 I've built a benchmark database of slightly over 100,000 documents from
 wikipedia, and indexed these with flint and chert.  When searching the
 resulting databases, with 10,000 single term searches, with the database
 fully cached, the flint database completes all searches in 1.78 seconds,
 whereas the chert database completes all searches in 12.58 seconds - ie,
 chert is about 7 times slower than flint.

 Note that the chert database is considerably smaller than the flint
 database, which hopefully means that in the uncached case chert might
 perform better.  However, with databases under 1m documents, we're likely
 to be IO bound, and performance will be much worse with chert.

 Profiling the code with callgrind revealed that around 85% of the CPU time
 is being spent in get_doclength() calls, and around 90% of that time is
 spent in ChertPostList::move_forward_in_chunk_to_at_least().  This method
 calls next_in_chunk() repeatedly to find the appropriate doclen: on
 average, it calls next_in_chunk() around 30 times per call.

 I don't think this degree of slowdown is acceptable, so we need to either
 find a way to make the code faster with the existing datastructure, or
 find a way to allow faster seeking in the doclen list.

 I've done some experiments about this, which I'll detail in subsequent

Ticket URL: <http://trac.xapian.org/ticket/326>
Xapian <http://xapian.org/>

More information about the Xapian-tickets mailing list