[Xapian-discuss] Optimal usage of xapian-compact for merging

Olly Betts olly at survex.com
Sun Feb 7 23:01:20 GMT 2010


On Wed, Feb 03, 2010 at 11:12:44AM +0200, Henry C. wrote:
> On Wed, February 3, 2010 07:40, Olly Betts wrote:
> > Or just merge all the databases in a single invocation.
> 
> Merging several hundred thousand dbs in a single invocation presents a
> spot of bother :)

What goes wrong?

If it is trying to open them all simultaneously, that should be fixable for
--multipass mode.

> > Currently the grouping under -m is fairly crude - postlists are just
> > merged in pairs (plus a three if there are an odd number), and then the
> > merged lists are remerged in the same way until we have just one, but that
> > may be reasonable even for mismatched sizes.
> >
> > It would probably be significantly faster not to use a Btree for the
> > intermediate stages, but just serialise it to a flat file - we will end up
> >  rereading it in order.  That would only make a difference when merging
> > more than 3 databases though.
> >
> > I should file a ticket for it - it would make a fairly self-contained
> > project for someone wanting to hack on Xapian without needing to
> > understand much of the internals.

http://trac.xapian.org/ticket/444

> What kind of improvement do you think we'll see?

I did a quick estimate - see the ticket for details.

Cheers,
    Olly



More information about the Xapian-discuss mailing list