[Xapian-tickets] [Xapian] #423: Document termlist get_termfreq() method behaviour depends on whether terms are cached
Xapian
nobody at xapian.org
Tue Dec 29 22:25:44 GMT 2009
#423: Document termlist get_termfreq() method behaviour depends on whether terms
are cached
-------------------------+--------------------------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Library API | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Description changed by richard:
Old description:
> The TermIterator objects returned by Document.termlist_begin(), for a
> document obtained from a database, can sometimes be used to obtain the
> term frequency, and sometimes can't be. It's unpredictable which
> behaviour is obtained, unless you know the details of the implementation
> of caching of terms in Documents.
>
> For example, the following python code uses this to obtain the frequency,
> and works fine:
>
> {{{
> import xapian
> db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
> doc=xapian.Document()
> doc.add_term('foo')
> db.add_document(doc)
> doc=db.get_document(1)
> t=doc.termlist()
> item=t.next()
> item.termfreq
> }}}
>
> However, the following code (with one added line) doesn't:
>
> {{{
> import xapian
> db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
> doc=xapian.Document()
> doc.add_term('foo')
> db.add_document(doc)
> doc=db.get_document(1)
> doc.termlist_count() # Added line
> t=doc.termlist()
> item=t.next()
> item.termfreq
> }}}
>
> For me, this code raises: "InvalidOperationError: Can't get term
> frequency from a document termlist which is not associated with a
> database."
>
> This behaviour is because the termlist_count() method causes the terms to
> be loaded into the document, and Document then uses a MapTermList to
> return the term iterator.
>
> Not sure of the easiest way to fix this - we could make MapTermList be
> able to keep a reference to a database, and pass off such requests to the
> database if set (or, better, subclass MapTermList for documents which are
> connected to a database).
New description:
The !TermIterator objects returned by Document.termlist_begin(), for a
document obtained from a database, can sometimes be used to obtain the
term frequency, and sometimes can't be. It's unpredictable which
behaviour is obtained, unless you know the details of the implementation
of caching of terms in Documents.
For example, the following python code uses this to obtain the frequency,
and works fine:
{{{
import xapian
db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
doc=xapian.Document()
doc.add_term('foo')
db.add_document(doc)
doc=db.get_document(1)
t=doc.termlist()
item=t.next()
item.termfreq
}}}
However, the following code (with one added line) doesn't:
{{{
import xapian
db=xapian.WritableDatabase('foo', xapian.DB_CREATE_OR_OVERWRITE)
doc=xapian.Document()
doc.add_term('foo')
db.add_document(doc)
doc=db.get_document(1)
doc.termlist_count() # Added line
t=doc.termlist()
item=t.next()
item.termfreq
}}}
For me, this code raises: "!InvalidOperationError: Can't get term
frequency from a document termlist which is not associated with a
database."
This behaviour is because the termlist_count() method causes the terms to
be loaded into the document, and Document then uses a !MapTermList to
return the term iterator.
Not sure of the easiest way to fix this - we could make !MapTermList be
able to keep a reference to a database, and pass off such requests to the
database if set (or, better, subclass !MapTermList for documents which are
connected to a database).
--
--
Ticket URL: <http://trac.xapian.org/ticket/423#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list