[Xapian-tickets] [Xapian] #763: Track unique term bounds for documents in the collection
Xapian
nobody at xapian.org
Thu Jul 26 00:40:30 BST 2018
#763: Track unique term bounds for documents in the collection
-------------------------+---------------------------
Reporter: gp1308 | Owner: gp1308
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Library API | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+---------------------------
Comment (by olly):
You definitely should not be calling `write()` there - the new version
file should only get written once, after everything else in the compaction
run is complete. Writing it earlier is (a) wasted effort and (b) makes it
appear there's a working database there when it's actually incomplete -
especially bad if the run is interrupted at that point (e.g. run `xapian-
compact` and hit Ctrl+C).
You could pass in a reference or pointer to `version_file_out`. Or pass
the min and max bound variables by reference.
`HoneyWritableDatabase` (or similar) will also need to track these stats,
but the fix here isn't just temporary - it's useful to be able to
efficiently convert a database to from the old format to the new one, so I
think we will keep that feature.
You can compact a honey database to another honey database (currently
there are bugs in some cases, but I think a simple compaction works).
There it should just copy the bounds from the source database (or merge
the bounds if there are multiple sources - i.e. take the minimum of the
lower bounds and the maximum of the upper bounds, taking care to skip any
sources where unique_terms is always zero).
--
Ticket URL: <https://trac.xapian.org/ticket/763#comment:6>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list