[Xapian-discuss] xapian-compact and -F option (Enable fuller compaction)

Olly Betts olly at survex.com
Sat Jun 17 17:29:19 BST 2006


On Sat, Jun 17, 2006 at 06:47:53AM -0800, oscaruser at programmer.net wrote:
> There is any way to achieve the document updating function using URL as keys?

Not using xapian-compact.  The way it works is to copy the raw Btree table
entries rewriting the keys such that the docids get shifted by a
constant offset for each source database.  That doesn't seem to fit well
with trying to eliminate duplicates.

The best approach to eliminating duplicates is usually not to create
them in the first place.  Sure it's easier not to worry about it, but
you just move the work and complexity elsewhere in the system, and it
typically ends up being more work to eliminate them later.

But if you don't want to do that and want to use xapian-compact, your
best bet is probably to walk the allterms list of the merged database
looking at each URL term, and delete all but the first (or last)
document when any term indexes more than one.

Alternatively, modify copy-database.cc to iterate through the URL terms
for each source database (instead of iterating document ids) and call
replace_document with the URL term instead of add_document.

Cheers,
    Olly



More information about the Xapian-discuss mailing list