[Xapian-discuss] scoring question
olly at survex.com
Wed Mar 21 09:28:23 GMT 2007
On Tue, Mar 20, 2007 at 06:05:31PM -0700, Alexander Lind wrote:
> I have just realized that if I set a query like
> 'green jelly bean'
> xapian will turn that query into
> 'green OR jelly OR bean'
By default - you can set the default operator to "AND" with
> This causes documents containing just one of the words to be considered
> a 100% hit.
It shouldn't do (unless you're using the BoolWeight weighting scheme or
a similar user defined weighting scheme).
> The behavior I would like to see is that each word gives a 33.3% hit, so
> that a document containing all 3 words gets placed above a document with
> only 1 or 2 words in it.
This is roughly what should happen. The actual score from each word is
determined by considering the frequencies of the terms in the collection
and the document, and looking at document lengths, so it generally won't
be 33.3%. If you really want exactly that, it could be achieved with a
user defined weighting scheme.
If you're really getting 100% for a document only matching one term of
a multi-term query, can you provide a small test case?
More information about the Xapian-discuss