[Xapian-discuss] word pair indexing and querying
Mark Hagger
mark.hagger at m-spatial.com
Thu Sep 21 14:17:40 BST 2006
Ah, no, I see why you suggested that, but on reflection perhaps I didn't
explain my requirements very well.
The problem with the DEFAULTOP=AND approach is that then a query for
"garden centres bristol" or indeed "bristol garden centres that sell red
plants" will not match the "garden centres" record, for obvious reasons.
In essence there are a number of cases where I'd like to add boolean
keywords to the index for a record that are actually multi-word
keywords, ie any of the individual words in isolation of the multi-word
sequence are not enough to give a (good) match, but still allow an
overall OR type query.
Consider the example of a "wifi hotspot" record, I'd notionally like the
5 keywords:
wifi
wi fi
wi fi hotspot
wi fi hot spot
wifi hot spot
But clearly it would be less than useful for a query for "wi cake
sellers" to match this record, nor indeed a search for "red spot on
chin" to match.
Mark
On Thu, 2006-09-21 at 13:01 +0100, James Aylett wrote:
> On Thu, Sep 21, 2006 at 12:24:01PM +0100, Mark Hagger wrote:
>
> > Is it possible, preferably using the simplistic scriptindex and
> > cgi-bin/omega approach, to create and query a database so that I can
> > force matches to only occur for word pairs.
>
> You can set the default operator in omega to AND instead of OR. (Set
> DEFAULTOP as an argument to the omega CGI.)
>
> You can also use a phrase search, although this will be slower.
>
> > For example I would want a match for "garden centre" but no match at
> > all, or perhaps just a low relevance match, for the query "garden" or
> > "centre". Whereas my current approach using an indexscript with
> > something like the following:
> >
> > name : truncate=100 field=caption boolean=name index
> >
> > and a data file with:
> >
> > name=garden centre
> >
> > means that I get 100% relevance matches for any of "garden", "centre" or
> > "garden centre", which is rather unfortunate in my case.
> >
> > Any thoughts/ideas/cunning plans would be appreciated.
>
> Have you tried this with real data? Working with short test documents
> often won't give you a realistic idea of what will actually happen.
>
> Out of interest, why are you doing boolean=name? boolean=S would be
> more usual, particularly if you want to use omega to search it.
>
> J
>
________________________________________________________________________
This email has been scanned for all known viruses by the MessageLabs SkyScan service.
More information about the Xapian-discuss
mailing list