[Xapian-tickets] [Xapian] #543: excessive postlist.DB reads on flush
Xapian
nobody at xapian.org
Tue Apr 12 23:33:05 BST 2011
#543: excessive postlist.DB reads on flush
---------------------------+------------------------------------------------
Reporter: fulltext | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.2.x
Component: Backend-Chert | Version: 1.2.5
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
---------------------------+------------------------------------------------
Changes (by olly):
* keywords: iowait, flush, postlist =>
* status: new => assigned
* milestone: => 1.2.x
Comment:
Thanks for your report.
Xapian is merging the new postlist entries with the existing ones, so it
has to read some of the old ones to do this. The best way to make this
more efficient (if you have spare RAM) is to increase the batch size - set
XAPIAN_FLUSH_THRESHOLD in the environment to control this (and make sure
it is exported if setting it in the shell). It counts in documents
changed and the default is 10000.
You can also build a number of smaller databases and merge them with
xapian-compact.
You don't say anything about the indexing you're doing. If you're only
appending new documents, i.e. only calling add_document(), that's
significantly more efficient than replacing existing documents (as we only
need to read the first and last block of each postlist). Also, looking up
a unique term will require some additional reads.
I don't have a good explanation for the times where it reads significantly
more than it writes though. Unless you're causing significant postlist
reads in your indexing code, it sounds like a bug in Xapian, but perhaps
the system was running other stuff during that batch which looked at sdb,
e.g. background stuff from cron like the locate db updater or the tracker
file search thing?
Measuring I/O for just the Xapian process would eliminate this potential
issue if there's other stuff reading from sdb. If it's a Xapian bug, it's
probably going to be hard to track down without more information - ideally
some code which reproduces this.
If that's difficult, checking with the flint backend might be informative.
If flint doesn't exhibit this behaviour, it's related to something which
changed between the two, which would narrow it down quite a lot.
--
Ticket URL: <http://trac.xapian.org/ticket/543#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list