[Xapian-devel] FLAG_PARTIAL and subset words

Greg freediving at gmail.com
Mon May 16 11:47:03 BST 2011


On Mon, May 16, 2011 at 12:08 PM, Olly Betts <olly at survex.com> wrote:
> On Mon, May 16, 2011 at 09:34:39AM +0200, Greg wrote:
>> I'll try and create a new database containing only 2 items "volvo v90"
>> and "volvo s60" as far as I understand searching for "volvo v" should
>> return one result and "volvo" two.
>
> The "v" would expand to "v90 SYNONYM volvo", so "volvo v" would return 2
> results.  SYNONYM is OR, but with a different weighting, so ignoring the
> weights, "volvo v" is "volvo AND (v90 OR volvo)" which (again ignoring
> weights) is the same as just "volvo".
>
> The only difference you might see between "volvo v" and "volvo" is in
> the order in which the documents are returned.
>
> If the "v" expands to a lot of terms with a high combined frequency,
> then the synonym operator will tend to result in "v" contributing rather
> less weight than "volvo" does, so "volvo" will tend to dominate the
> ordering.
>
> Cheers,
>    Olly
That however is not what's happening, in the database I provided
"volvo v" returns two results {volvo 1998 v90, volvo 2001 s60} while
"volvo s" one {volvo 2001 s60}. Searching for "volvo" only returns the
same two results as in the first case just with the percentage being
different, first case ("volvo v") it's [volvo 1998 v90] 66% and [volvo
2001 s60] 60% and the second ("volvo") [volvo 1998 v90] 100% and
[volvo 2001 s60] 100%.



More information about the Xapian-devel mailing list