[Xapian-tickets] [Xapian] #423: Document termlist get_termfreq() method behaviour depends on whether terms are cached

Xapian nobody at xapian.org
Tue Dec 29 22:25:01 GMT 2009


#423: Document termlist get_termfreq() method behaviour depends on whether terms
are cached
-------------------------+--------------------------------------------------
 Reporter:  richard      |       Owner:  olly     
     Type:  defect       |      Status:  new      
 Priority:  normal       |   Milestone:           
Component:  Library API  |     Version:  SVN trunk
 Severity:  normal       |    Keywords:           
Blockedby:               |    Platform:  All      
 Blocking:               |  
-------------------------+--------------------------------------------------
 The TermIterator objects returned by Document.termlist_begin(), for a
 document obtained from a database, can sometimes be used to obtain the
 term frequency, and sometimes can't be.  It's unpredictable which
 behaviour is obtained, unless you know the details of the implementation
 of caching of terms in Documents.

 For example, the following python code uses this to obtain the frequency,
 and works fine:

 {{{
 import xapian
 db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
 doc=xapian.Document()
 doc.add_term('foo')
 db.add_document(doc)
 doc=db.get_document(1)
 t=doc.termlist()
 item=t.next()
 item.termfreq
 }}}

 However, the following code (with one added line) doesn't:

 {{{
 import xapian
 db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
 doc=xapian.Document()
 doc.add_term('foo')
 db.add_document(doc)
 doc=db.get_document(1)
 doc.termlist_count()   # Added line
 t=doc.termlist()
 item=t.next()
 item.termfreq
 }}}

 For me, this code raises: "InvalidOperationError: Can't get term frequency
 from a document termlist which is not associated with a database."

 This behaviour is because the termlist_count() method causes the terms to
 be loaded into the document, and Document then uses a MapTermList to
 return the term iterator.

 Not sure of the easiest way to fix this - we could make MapTermList be
 able to keep a reference to a database, and pass off such requests to the
 database if set (or, better, subclass MapTermList for documents which are
 connected to a database).

-- 
Ticket URL: <http://trac.xapian.org/ticket/423>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list