[Xapian-discuss] Counting and statistics
Richard Boulton
richard at lemurconsulting.com
Wed Mar 28 09:30:50 BST 2007
Andreas Marienborg wrote:
> I was wondering if it is possible to figure out "popular terms" in a
> given set of documents (not the entire database, but lets say the 1000
> last articles).
You want to read the documentation for the Enquire::get_eset() method.
This takes a list of "relevant documents" (as an RSet object), and
returns a list of terms. The terms returned will be ordered by a
weighting function, which rewards terms which are high frequency in the
documents in the RSet compared to the corpus as a whole.
In a sense, this method is the dual of the get_mset() method - it
returns a list of terms given a list of documents.
If you want to dig into the code of omega, you'll find that the
implementation of the topterms functionality there uses this method.
--
Richard
More information about the Xapian-discuss
mailing list