[Xapian-discuss] How term distance impacts the weight?
Olly Betts
olly at survex.com
Wed Aug 3 03:15:07 BST 2011
On Tue, Aug 02, 2011 at 11:02:36PM +0800, Bruce Zhang wrote:
> One of example I can think of is:
> in Oriental language, there is no separator between words in sentence.
But the entire point of segmentation is to work out where the word
breaks are. Once we've segmented the text, I'm not sure why the
situation would be so different to languages where the word breaks are
explicitly indicated with spaces.
> after word segment, 2 words can be adjacent, but maybe separated by
> some adjective in some other documents.
Just like in English, and many other languages.
> at this situation, we can still think that the search match the search
> criteria.
>
> so short distance means more accuracy in some situations. but if distance is
> far, means they are no relation with each other of 2 words
Giving extra weight to when the query words appear closer is going to
benefit any language - generally you want the query terms to be in the
same context within a document, regardless of what language the document
is written in.
It's just nobody's implemented this in Xapian yet.
Cheers,
Olly
More information about the Xapian-discuss
mailing list