[Xapian-tickets] [Xapian] #451: Add option to compaction to rebuild postlist chunks
Xapian
nobody at xapian.org
Tue Jan 28 05:09:53 GMT 2020
#451: Add option to compaction to rebuild postlist chunks
-----------------------------+------------------------------------
Reporter: Richard Boulton | Owner: Richard Boulton
Type: enhancement | Status: new
Priority: normal | Milestone: 1.5.0
Component: Library API | Version: git master
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-----------------------------+------------------------------------
Changes (by Olly Betts):
* version: SVN trunk => git master
* milestone: 1.4.x => 1.5.0
Comment:
> If document IDs are being preserved (via the --no-renumber option),
xapian-compact cannot merge databases with overlapping document ID ranges
(even if no documents occur in both databases).
I wonder if this one is really an unreasonable limitation. Nobody's
complained about it since that I can recall. Did you have a use case for
it?
> Modifications to a database can result in many small chunks; recombining
these chunks into larger chunks should result in faster searches. Xapian-
compact doesn't currently do this.
Ideally these would get combined in the normal course of operations, but
even then there's still the case of merging several databases and a term
occurring a small number of times in each - then we potentially have one
small postlist chunk per input database.
217a67f792a93ceb085749c42a66c8829f1a9573 improves this for honey on git
master - now adjacent input chunks are spliced together until doing so
would exceed HONEY_POSTLIST_CHUNK_MAX. We don't try to split input chunks
currently so it's not a full version of what's proposed here, but this
splicing can be done without decoding so it's faster.
At this point I don't think we'd do this for glass or 1.4.x, but rather
for honey in the next release series.
--
Ticket URL: <https://trac.xapian.org/ticket/451#comment:5>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list