[Xapian-tickets] [Xapian] #763: Track unique term bounds for documents in the collection

Xapian nobody at xapian.org
Thu Jul 26 00:40:30 BST 2018


#763: Track unique term bounds for documents in the collection
-------------------------+---------------------------
 Reporter:  gp1308       |             Owner:  gp1308
     Type:  enhancement  |            Status:  new
 Priority:  normal       |         Milestone:
Component:  Library API  |           Version:
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+---------------------------

Comment (by olly):

 You definitely should not be calling `write()` there - the new version
 file should only get written once, after everything else in the compaction
 run is complete.  Writing it earlier is (a) wasted effort and (b) makes it
 appear there's a working database there when it's actually incomplete -
 especially bad if the run is interrupted at that point (e.g. run `xapian-
 compact` and hit Ctrl+C).

 You could pass in a reference or pointer to `version_file_out`.  Or pass
 the min and max bound variables by reference.

 `HoneyWritableDatabase` (or similar) will also need to track these stats,
 but the fix here isn't just temporary - it's useful to be able to
 efficiently convert a database to from the old format to the new one, so I
 think we will keep that feature.

 You can compact a honey database to another honey database (currently
 there are bugs in some cases, but I think a simple compaction works).
 There it should just copy the bounds from the source database (or merge
 the bounds if there are multiple sources - i.e. take the minimum of the
 lower bounds and the maximum of the upper bounds, taking care to skip any
 sources where unique_terms is always zero).

--
Ticket URL: <https://trac.xapian.org/ticket/763#comment:6>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list