[Xapian-discuss] Optimal usage of xapian-compact for merging
Henry C.
henka at cityweb.co.za
Wed Feb 3 09:12:44 GMT 2010
On Wed, February 3, 2010 07:40, Olly Betts wrote:
> Or just merge all the databases in a single invocation.
Merging several hundred thousand dbs in a single invocation presents a
spot of bother :)
> I don't have figures to compare these, and it may vary according to your
> data, OS, FS, and/or hardware, so all I can really suggest is to try the
> different approaches and see. Do report if you find anything
> interesting.
Looks like I've found a sweet spot with merging batches of 50 - but will
try more.
> Currently the grouping under -m is fairly crude - postlists are just
> merged in pairs (plus a three if there are an odd number), and then the
> merged lists are remerged in the same way until we have just one, but that
> may be reasonable even for mismatched sizes.
>
> It would probably be significantly faster not to use a Btree for the
> intermediate stages, but just serialise it to a flat file - we will end up
> rereading it in order. That would only make a difference when merging
> more than 3 databases though.
>
> I should file a ticket for it - it would make a fairly self-contained
> project for someone wanting to hack on Xapian without needing to
> understand much of the internals.
What kind of improvement do you think we'll see?
>
>> Finally, presumably it's best to use the same blocksize (-b) as the
>> underlying filesystem? I see the default is 8K, but the default
>> blocksize on (eg) ext3 is 4k... or am I way off here?
>
> It should certainly not be smaller than the hardware blocksize (or else
> you need to read the existing disk-block in order to write a
> Xapian-block). A
> multiple is fine though, and larger blocks are a bit more efficient. I
> did some tests a year or so ago which suggested 16KB might be slightly
> better than 8KB, but it is sufficiently close that it didn't seem to
> justify changing the default.
Thanks for the considered response.
Regards
Henry
More information about the Xapian-discuss
mailing list