[Xapian-tickets] [Xapian] #763: Track unique term bounds for documents in the collection
Xapian
nobody at xapian.org
Mon Jul 23 08:03:50 BST 2018
#763: Track unique term bounds for documents in the collection
-------------------------+---------------------------
Reporter: gp1308 | Owner: gp1308
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Library API | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+---------------------------
Comment (by olly):
The tracking should look a lot like the tracking for the document length
bounds. These are stored in the "version" file - i.e. `iamglass` for
glass. See `doclen_lbound` and `doclen_ubound` in `glass_version.cc`.
Unfortunately that code checks that there's no undecoded data after it has
decoded the stats we know about, so we can't just add new stats and have
older versions ignore them. In hindsight we should have omitted that
check so we could add new stats.
So probably we don't implement this for glass, I'd suggest just not
worrying about it being an incompatible change for now - we'll probably
not merge this change for glass, but instead apply it for honey, which is
the next generation backend but still in development. But honey doesn't
yet support updating databases - currently you have to compact a glass
database to create a honey one, so implementing this for honey without
implementing it for glass means that the compacting code which converts
from glass to honey needs to calculate these bounds as it loops over all
the documents - probably as it does the termlist table.
That code is in `backends/honey/honey_compact.cc`, line 1866 currently.
That loop needs to count how terms have a non-zero wdf to get the number
of unique terms in each document, and then track lower and upper bounds on
that as we work through the table (the lower bound should ignore 0, since
such documents won't be involved in weighted queries). And then store
those in the `iamhoney` file.
--
Ticket URL: <https://trac.xapian.org/ticket/763#comment:2>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list