[Xapian-tickets] [Xapian] #444: xapian-compact --multipass should use flat intermediate files

Xapian nobody at xapian.org
Sun Feb 7 22:58:37 GMT 2010


#444: xapian-compact --multipass should use flat intermediate files
---------------------------+------------------------------------------------
 Reporter:  olly           |       Owner:  olly     
     Type:  enhancement    |      Status:  new      
 Priority:  normal         |   Milestone:  1.2.x    
Component:  Backend-Chert  |     Version:  SVN trunk
 Severity:  normal         |    Keywords:           
Blockedby:                 |    Platform:  All      
 Blocking:                 |  
---------------------------+------------------------------------------------
 Current --multipass creates temporary intermediate B-trees, but we write
 these in sorted key order, and then reread them in the same order, so we
 could just use a flat file with a format like something like:

 {{{
 <length of key><key><length of tag><tag>
 }}}

 Using a flat file means less I/O (and we'll be I/O bound here), the I/O
 will be linear (which is easier for the OS, FS, and hardware to handle
 efficiently), and also less CPU.

 Probably worth prefix-compressing the keys, since we're I/O bound here,
 and using less intermediate disk space is also a bonus:

 {{{
 <length of previous key to reuse><length of key tail><key tail><length of
 tag><tag>
 }}}

 A quick estimate suggests that the dump file will probably be 8-9% smaller
 than the equivalent intermediate table, so assuming I/O is the only
 factor, we'd save about that time on the intermediate
 compacting stages for the postlist table.  In fact there's CPU time too,
 and we'll save on that (so assuming I/O is the only factor is probably
 OK).  We'll also save a bit for doing purely linear I/O.  Reading the
 source databases, and writing the final databases wouldn't be sped up, and
 neither would the the other tables (which we can just copy over in turn).

 No ABI or API changes required, so marking for 1.2.x.

-- 
Ticket URL: <http://trac.xapian.org/ticket/444>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list