[Xapian-discuss] Document and folder suggestions

Olly Betts olly at survex.com
Sun Jan 27 23:01:09 GMT 2008


On Sun, Jan 27, 2008 at 04:01:54PM -0500, Serkan Cabi wrote:
> Currently to find related documents I get a document, create a one  
> item rset, get eset of max size 100 of it and search those terms to  
> get a list of documents. Here is the code:

I suspect 100 is too many.  Omega uses 40 for this (raised from 6 after
someone reported that gave better results), but it's certainly worth
experimenting.

> 1) Is there a better way to get similar documents for a given document?

You could take all the terms from the given document and combine them
with OP_ELITE_SET to select the best discriminators and run a query with
those.  I'm not sure which would give better results, but they're likely
to involve a similar amount of work.

> 2) Is there way to suggest a folder for a given document to be  
> classified in?

Assuming you add a boolean term for the folder to each document in the
database, run the given document as a query, and mark the top few
results as relevant, then expand selecting only the folder terms.

Cheers,
    Olly



More information about the Xapian-discuss mailing list