[Xapian-discuss] Counting and statistics

Richard Boulton richard at lemurconsulting.com
Wed Mar 28 09:30:50 BST 2007


Andreas Marienborg wrote:
> I was wondering if it is possible to figure out "popular terms" in a 
> given set of documents (not the entire database, but lets say the 1000 
> last articles).

You want to read the documentation for the Enquire::get_eset() method.
This takes a list of "relevant documents" (as an RSet object), and 
returns a list of terms.  The terms returned will be ordered by a 
weighting function, which rewards terms which are high frequency in the 
documents in the RSet compared to the corpus as a whole.

In a sense, this method is the dual of the get_mset() method - it 
returns a list of terms given a list of documents.

If you want to dig into the code of omega, you'll find that the 
implementation of the topterms functionality there uses this method.

-- 
Richard



More information about the Xapian-discuss mailing list