[Xapian-devel] Omega and MORELIKE

Olly Betts olly at survex.com
Mon Jun 28 00:02:19 BST 2004


On Wed, Nov 26, 2003 at 08:31:00PM +0000, Olly Betts wrote:
> On Fri, Nov 21, 2003 at 11:52:44AM +0100, Franck Meunier wrote:
> > I indexed internal docs, and then ask omega to "MORELIKE" a document.
> > It gives me only a link at 43% to the previous revision of this
> > document (i also index my archives)... 
> > 
> > The problem is that their was only two or three differences between them.
> > 
> > I found that to do a MORELIKE, omega constuct a RSet with the
> > document, extracts the 6 first terms of the corresponding ESet, and
> > create a new query with them. 
> > 
> > 6 seems to be short... I extend this value to 40, and it looks really
> > better (99% for my two documents).
> > 
> > Have you ever experienced a problem with this parameter ?
> 
> The MORELIKE functionality was originally written as an experimental
> feature for EuroFerret:
> 
> http://web.archive.org/web/19991013083615/http://euroferret.com/
> 
> EuroFerret indexed each page by the 60 best terms, and I suspect that
> the choice of 6 is based on tuning for that database.
> 
> As you're presumably indexing all the terms in each page, it's not at
> all suprising that a larger number gives better results.  I wonder if we
> can either set a better pick threshold by looking at the expand weights,
> or perhaps just as a function of the number of terms indexing the
> document we're trying to find more like.  I'll take a look.

Sorry, I've only just noticed this mail sitting awaiting attention.
I've simply raised the limit to 40 as you suggest for now.  I'll make
a not to investigate a more dynamic approach.  Feedback on this change
is welcome.

Cheers,
    Olly




More information about the Xapian-devel mailing list