[Xapian-tickets] [Xapian] #543: excessive postlist.DB reads on flush

Xapian nobody at xapian.org
Tue Apr 12 23:33:05 BST 2011


#543: excessive postlist.DB reads on flush
---------------------------+------------------------------------------------
 Reporter:  fulltext       |       Owner:  olly    
     Type:  defect         |      Status:  assigned
 Priority:  normal         |   Milestone:  1.2.x   
Component:  Backend-Chert  |     Version:  1.2.5   
 Severity:  normal         |    Keywords:          
Blockedby:                 |    Platform:  All     
 Blocking:                 |  
---------------------------+------------------------------------------------
Changes (by olly):

  * keywords:  iowait, flush, postlist =>
  * status:  new => assigned
  * milestone:  => 1.2.x


Comment:

 Thanks for your report.

 Xapian is merging the new postlist entries with the existing ones, so it
 has to read some of the old ones to do this.  The best way to make this
 more efficient (if you have spare RAM) is to increase the batch size - set
 XAPIAN_FLUSH_THRESHOLD in the environment to control this (and make sure
 it is exported if setting it in the shell).  It counts in documents
 changed and the default is 10000.

 You can also build a number of smaller databases and merge them with
 xapian-compact.

 You don't say anything about the indexing you're doing.  If you're only
 appending new documents, i.e. only calling add_document(), that's
 significantly more efficient than replacing existing documents (as we only
 need to read the first and last block of each postlist).  Also, looking up
 a unique term will require some additional reads.

 I don't have a good explanation for the times where it reads significantly
 more than it writes though.  Unless you're causing significant postlist
 reads in your indexing code, it sounds like a bug in Xapian, but perhaps
 the system was running other stuff during that batch which looked at sdb,
 e.g. background stuff from cron like the locate db updater or the tracker
 file search thing?

 Measuring I/O for just the Xapian process would eliminate this potential
 issue if there's other stuff reading from sdb.  If it's a Xapian bug, it's
 probably going to be hard to track down without more information - ideally
 some code which reproduces this.

 If that's difficult, checking with the flint backend might be informative.
 If flint doesn't exhibit this behaviour, it's related to something which
 changed between the two, which would narrow it down quite a lot.

-- 
Ticket URL: <http://trac.xapian.org/ticket/543#comment:1>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list