[Xapian-devel] FLAG_PARTIAL and subset words

Greg freediving at gmail.com
Mon May 16 12:14:27 BST 2011


On Mon, May 16, 2011 at 12:53 PM, Greg <freediving at gmail.com> wrote:
> On Mon, May 16, 2011 at 12:47 PM, Greg <freediving at gmail.com> wrote:
>> On Mon, May 16, 2011 at 12:08 PM, Olly Betts <olly at survex.com> wrote:
>>> On Mon, May 16, 2011 at 09:34:39AM +0200, Greg wrote:
>>>> I'll try and create a new database containing only 2 items "volvo v90"
>>>> and "volvo s60" as far as I understand searching for "volvo v" should
>>>> return one result and "volvo" two.
>>>
>>> The "v" would expand to "v90 SYNONYM volvo", so "volvo v" would return 2
>>> results.  SYNONYM is OR, but with a different weighting, so ignoring the
>>> weights, "volvo v" is "volvo AND (v90 OR volvo)" which (again ignoring
>>> weights) is the same as just "volvo".
>>>
>>> The only difference you might see between "volvo v" and "volvo" is in
>>> the order in which the documents are returned.
>>>
>>> If the "v" expands to a lot of terms with a high combined frequency,
>>> then the synonym operator will tend to result in "v" contributing rather
>>> less weight than "volvo" does, so "volvo" will tend to dominate the
>>> ordering.
>>>
>>> Cheers,
>>>    Olly
>> That however is not what's happening, in the database I provided
>> "volvo v" returns two results {volvo 1998 v90, volvo 2001 s60} while
>> "volvo s" one {volvo 2001 s60}. Searching for "volvo" only returns the
>> same two results as in the first case just with the percentage being
>> different, first case ("volvo v") it's [volvo 1998 v90] 66% and [volvo
>> 2001 s60] 60% and the second ("volvo") [volvo 1998 v90] 100% and
>> [volvo 2001 s60] 100%.
> What I'm assuming is happening is that the trailing "v" matches the
> "volvo" in [volvo 2001 s60] although "volvo" was already matched by
> "volvo".
Apologies for the multitude of messages anyway that's exactly what's happening:

"volvo v"::
Xapian::Query((Zvolvo:(pos=1) AND ((v90:(pos=2) SYNONYM volvo:(pos=2))
OR Zv:(pos=2))))
as you can see here the trailing "v" expands into volvo and get
matched to "volvo s60"

"volvo s"::
Xapian::Query((Zvolvo:(pos=1) AND ((s60:(pos=2) SYNONYM
series:(pos=2)) OR Zs:(pos=2))))
here it works as expected.

I'm not sure if that's intended ie. having both volvo and v match the same term.



More information about the Xapian-devel mailing list