[Xapian-devel] FLAG_PARTIAL and subset words

Greg freediving at gmail.com
Mon May 16 11:53:47 BST 2011


On Mon, May 16, 2011 at 12:47 PM, Greg <freediving at gmail.com> wrote:
> On Mon, May 16, 2011 at 12:08 PM, Olly Betts <olly at survex.com> wrote:
>> On Mon, May 16, 2011 at 09:34:39AM +0200, Greg wrote:
>>> I'll try and create a new database containing only 2 items "volvo v90"
>>> and "volvo s60" as far as I understand searching for "volvo v" should
>>> return one result and "volvo" two.
>>
>> The "v" would expand to "v90 SYNONYM volvo", so "volvo v" would return 2
>> results.  SYNONYM is OR, but with a different weighting, so ignoring the
>> weights, "volvo v" is "volvo AND (v90 OR volvo)" which (again ignoring
>> weights) is the same as just "volvo".
>>
>> The only difference you might see between "volvo v" and "volvo" is in
>> the order in which the documents are returned.
>>
>> If the "v" expands to a lot of terms with a high combined frequency,
>> then the synonym operator will tend to result in "v" contributing rather
>> less weight than "volvo" does, so "volvo" will tend to dominate the
>> ordering.
>>
>> Cheers,
>>    Olly
> That however is not what's happening, in the database I provided
> "volvo v" returns two results {volvo 1998 v90, volvo 2001 s60} while
> "volvo s" one {volvo 2001 s60}. Searching for "volvo" only returns the
> same two results as in the first case just with the percentage being
> different, first case ("volvo v") it's [volvo 1998 v90] 66% and [volvo
> 2001 s60] 60% and the second ("volvo") [volvo 1998 v90] 100% and
> [volvo 2001 s60] 100%.
What I'm assuming is happening is that the trailing "v" matches the
"volvo" in [volvo 2001 s60] although "volvo" was already matched by
"volvo".



More information about the Xapian-devel mailing list