[Xapian-tickets] [Xapian] #444: xapian-compact --multipass should use flat intermediate files
Xapian
nobody at xapian.org
Sun Feb 7 22:58:37 GMT 2010
#444: xapian-compact --multipass should use flat intermediate files
---------------------------+------------------------------------------------
Reporter: olly | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone: 1.2.x
Component: Backend-Chert | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
---------------------------+------------------------------------------------
Current --multipass creates temporary intermediate B-trees, but we write
these in sorted key order, and then reread them in the same order, so we
could just use a flat file with a format like something like:
{{{
<length of key><key><length of tag><tag>
}}}
Using a flat file means less I/O (and we'll be I/O bound here), the I/O
will be linear (which is easier for the OS, FS, and hardware to handle
efficiently), and also less CPU.
Probably worth prefix-compressing the keys, since we're I/O bound here,
and using less intermediate disk space is also a bonus:
{{{
<length of previous key to reuse><length of key tail><key tail><length of
tag><tag>
}}}
A quick estimate suggests that the dump file will probably be 8-9% smaller
than the equivalent intermediate table, so assuming I/O is the only
factor, we'd save about that time on the intermediate
compacting stages for the postlist table. In fact there's CPU time too,
and we'll save on that (so assuming I/O is the only factor is probably
OK). We'll also save a bit for doing purely linear I/O. Reading the
source databases, and writing the final databases wouldn't be sped up, and
neither would the the other tables (which we can just copy over in turn).
No ABI or API changes required, so marking for 1.2.x.
--
Ticket URL: <http://trac.xapian.org/ticket/444>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list