[Xapian-tickets] [Xapian] #423: Document termlist get_termfreq() method behaviour depends on whether terms are cached

Xapian nobody at xapian.org
Thu Apr 16 11:02:47 BST 2015


#423: Document termlist get_termfreq() method behaviour depends on whether terms
are cached
-------------------------+------------------------------
 Reporter:  richard      |             Owner:  olly
     Type:  defect       |            Status:  new
 Priority:  normal       |         Milestone:  1.3.x
Component:  Library API  |           Version:  SVN trunk
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------

Comment (by olly):

 This would actually be fairly easy to fix by changing
 `Document::termlist_count()` to open the termlist and call
 `get_approx_size()` on it (which is exact for a leaf termlist) to get this
 info from the termlist table, or by calling `get_freqs()` on the database
 to get the term frequency from the postlist table.  The advantage of the
 former is that it is more cache friendly if we then go on to read the
 termlist, while the latter works even if there's no termlist table.

 Once you actually modify a document from a database, I think it actually
 makes most sense for a request for the termfreq to fail as the document is
 no longer one that's actually in the database (though at first encounter
 this might seem surprising).

 However, just to add to the fun, the termfreq reported here is only for
 the current subdatabase, and to fix that I think we'd need to keep a
 reference to the whole `Xapian::Database` rather than just the current
 subdb.  Perhaps `TermIterator::get_termfreq()` should just be deprecated -
 all it actually does is to effectively call `Database::get_termfreq(*it)`.

--
Ticket URL: <http://trac.xapian.org/ticket/423#comment:4>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list