[Xapian-discuss] word pair indexing and querying

Mark Hagger mark.hagger at m-spatial.com
Thu Sep 21 17:42:38 BST 2006


Except thats not really going to work very well, here's an example query
on one of our development databases:

http://staging.gjm.info/cgi-bin/omega?P=centre&DB=Business52-GB&FMT=xml

This gives a 100% relevance hit against "job centre", so not much scope
for a cut-off there, and for the record in this application I'd need
this query to produce either nothing or at worst a low relevance hit
against "job centre".

(I would point out that this database has very little in it, just under
100 records.)

It is starting to look suspiciously as if xapian just isn't going to be
the way to go here, in truth even the biggest dataset that I'd be
playing with here won't be more than about 100k records.

Back to pondering.

Mark




On Thu, 2006-09-21 at 17:14 +0100, Olly Betts wrote:
> On Thu, Sep 21, 2006 at 02:17:40PM +0100, Mark Hagger wrote:
> > In essence there are a number of cases where I'd like to add boolean
> > keywords to the index for a record that are actually multi-word
> > keywords, ie any of the individual words in isolation of the multi-word
> > sequence are not enough to give a (good) match, but still allow an
> > overall OR type query.
> 
> I'd suggest just using an OR query with a percentage weight cut-off.
> Then anything significantly less good than the best match will get
> rejected.
> 
> Cheers,
>     Olly
> 
> ________________________________________________________________________
> This email has been scanned for all known viruses by the MessageLabs SkyScan service.


________________________________________________________________________
This email has been scanned for all known viruses by the MessageLabs SkyScan service.



More information about the Xapian-discuss mailing list