[Xapian-discuss] related documents

Olly Betts olly at survex.com
Tue Jul 27 05:13:27 BST 2010


On Mon, Jul 26, 2010 at 05:05:16PM +0100, Tim Brody wrote:
> I would like to take a doc in the xapian DB and find all related
> documents by relevance e.g. so when you view one document it says
> "Related entries X Y Z".
> 
> I'm aware of the "Morelikethis" Lucene plugin that is supposed to do
> something like this, by generating a query from a document based on term
> frequency.
> 
> Has anyone developed a tool to generate a query from a document?
> Is there a short-cut one can make with RSets?

Omega's MORELIKE feature is implemented like so:

    Xapian::RSet tmprset;
    tmprset.add_document(docid);
    OmegaExpandDecider decider(db);
    Xapian::ESet eset(enquire->get_eset(40, tmprset, &decider));
    for (Xapian::ESetIterator i = eset.begin(); i != eset.end(); ++i) {
	// Handle term *i
    }

If you want a query object, then you can just do:

    Xapian::RSet tmprset;
    tmprset.add_document(docid);
    OmegaExpandDecider decider(db);
    Xapian::ESet eset(enquire->get_eset(40, tmprset, &decider));
    Xapian::Query query(Xapian::Query::OP_OR, eset.begin(), eset.end());

This picks up to 40 terms, favouring those which are relatively more common
in the document than in the collection in general.

The OmegaExpandDecider class filters the terms you are interested in - you
can find that here:

http://trac.xapian.org/browser/trunk/xapian-applications/omega/query.h#L37
http://trac.xapian.org/browser/trunk/xapian-applications/omega/query.cc#L2242

Cheers,
    Olly



More information about the Xapian-discuss mailing list