[Xapian-discuss] More than one Index?

Olly Betts olly at survex.com
Sat Feb 4 04:12:05 GMT 2006


On Mon, Jan 30, 2006 at 12:58:10PM +0000, James Aylett wrote:
> You can get an iterator to all terms in the database using
> Xapian::Database::allterms_begin() (db.allterms_begin() in
> Python). The TermIterator that returns *may* be able to give you a
> frequency in the database via its get_termfreq() method, but I don't
> know if that actually works for the all terms iterator.

It does work.  TermIterator::get_termfreq() is implemented wherever it
is meaningful (it isn't always, e.g. if you create a Xapian::Document,
add some terms, and iterate over them, there's no real meaning for
get_termfreq).

> If not, you can use the iterator to get the terms, and use
> Xapian::Database::get_collection_freq() to get the frequency.

You mean Xapian::Database::get_termfreq() here.

The "termfreq" is the number of documents in the database indexed by a
given term, while the "collection_freq" is the total number of
occurrences of a given term - i.e. the sum of the wdf in each document
it occurs in.

It's possible that John is more interested in the collection frequency
in which case he'll need to ask the database for it.  TermIterator
should probably support get_collection_freq() where it's meaningful
- when iterating all the terms it's actually read by the code in the
backend (for quartz and flint at least), but there's no API method to
read it!

Cheers,
    Olly



More information about the Xapian-discuss mailing list