[Xapian-discuss] Matching exact phrases only

Chris Good chris at g2.nu
Tue Aug 8 19:49:32 BST 2006


James Aylett wrote:
> I'm not entirely certain what you /are/ trying to achieve, but I'm
> guessing some kind of location taxonomy is in play

Absolutely not, this is a simple plaintext search.  All of the clever
stuff related to location is abstracted away and in this case we're just
doing a simple match.  Perhaps a non-location example would help illustrate
what we're after.

Lets say we have records containing the following:

big red bus
red letter day
sent red bill
blue
red sky at night shepherds delight
blue taxi
empty blue taxi
sky consulting
blue sky consulting
sky blue


A search for "Red", this would match 'big red bus' 'red letter day' and
'send red bill', all of which would yield a 100% match.  

A search for "blue" meanwhile would have a range of scores, from 100% down.
Likewise "sky consulting" would yield 'sky consulting' downwards. 

The root problem is that we can't differentiate between a partial match
such as the "red" example, the results for which we'd want to discard
from the "blue" or "sky consulting" ones, the top match for each would be
perfect.

If we ignored xapian then we'd do our own stemming and look in a stemmed
flat file of records for the maximum length substring that matches and do
something like strlen(query_string)/strlen(match) to give a score.  However
that's not a terribly nice way of doing it.




More information about the Xapian-discuss mailing list