[Xapian-tickets] [Xapian] #423: Document termlist get_termfreq() method behaviour depends on whether terms are cached
Xapian
nobody at xapian.org
Thu Apr 16 11:02:47 BST 2015
#423: Document termlist get_termfreq() method behaviour depends on whether terms
are cached
-------------------------+------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.3.x
Component: Library API | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+------------------------------
Comment (by olly):
This would actually be fairly easy to fix by changing
`Document::termlist_count()` to open the termlist and call
`get_approx_size()` on it (which is exact for a leaf termlist) to get this
info from the termlist table, or by calling `get_freqs()` on the database
to get the term frequency from the postlist table. The advantage of the
former is that it is more cache friendly if we then go on to read the
termlist, while the latter works even if there's no termlist table.
Once you actually modify a document from a database, I think it actually
makes most sense for a request for the termfreq to fail as the document is
no longer one that's actually in the database (though at first encounter
this might seem surprising).
However, just to add to the fun, the termfreq reported here is only for
the current subdatabase, and to fix that I think we'd need to keep a
reference to the whole `Xapian::Database` rather than just the current
subdb. Perhaps `TermIterator::get_termfreq()` should just be deprecated -
all it actually does is to effectively call `Database::get_termfreq(*it)`.
--
Ticket URL: <http://trac.xapian.org/ticket/423#comment:4>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list