[Xapian-discuss] merge database and maintain order

Mark Clarkson mark.clarkson at smorg.co.uk
Sun Mar 25 06:16:20 BST 2007


On Sun, 2007-03-25 at 00:02 +0000, Olly Betts wrote: 
> Hmm, actually I see a neat hack.  If you add the first document to db2
> with a document id at least one more than the last document id of db1
> then the merged document ids will preserve the order within each db
> but put all the documents in db1 before those in db2.  

Many thanks for such a prompt reply. I've now implemented this
workaround and it works perfectly - thanks very much!

> Currently
> xapian-compact preserves spans of unused document ids at the start and
> end of the database, but that would be easy to fix.

Again, thankyou for this important piece of information. I did not see
why this would be important but after testing it I can see that I could
run out of document ids in a relatively short space of time depending on
the size of the collection.

I've hacked xapian-compact so that it doesn't add offsets and it seems
to work, but now it will break horribly if I try to merge databases that
have the same document ids.

I guess I'll have to be careful ;-)

Cheers
Mark.

--- bin/xapian-compact.cc.orig  2007-03-25 04:13:36.000000000 +0000
+++ bin/xapian-compact.cc       2007-03-25 04:13:39.000000000 +0000
@@ -152,7 +152,7 @@
        if (in->get_entry_count()) {
            // PostlistCursor takes ownership of FlintTable in and
            // is responsible for deleting it.
-           PostlistCursor * cur = new PostlistCursor(in, *offset);
+           PostlistCursor * cur = new PostlistCursor(in, 0);//*offset);
            // Merge the METAINFO tags from each database into one.
            // They have a key with a single zero byte, which will
            // always be the first key.
@@ -322,9 +322,9 @@
            Xapian::Database db(srcdir);
            // No point trying to merge empty databases!
            if (db.get_doccount() != 0) {
-               Xapian::docid last = db.get_lastdocid();
-               offset.push_back(tot_off);
-               tot_off += last;
+               //Xapian::docid last = db.get_lastdocid();
+               //offset.push_back(tot_off);
+               //tot_off += last;
                // FIXME: prune unused docids off the start and end of
each range...
                sources.push_back(string(srcdir) + '/');
            }





More information about the Xapian-discuss mailing list