[Xapian-devel] Xapian Indexer problem.
liminghit at 126.com
Tue Nov 18 13:51:56 GMT 2008
Thanks very much for your reply!
For a documents, it has its own term list.
That will be some terms.
So, how to calculate the term weight for these terms.
D1->term1, term2, term3
I want to get the term weight of term1, and term2 and term3.
I noticed that there is function “calc_termweight”. But it’s a private function.
在2008-11-15，"Olly Betts" <olly at survex.com> 写道：
>On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote:
>> Only ??0 001a heathrow taxis?? can have 100% matching.
>> Shorter or longer query, should less than 100% matching, right?
>A longer query would since (unless you repeat terms) it must have
>words which aren't in the document.
>But otherwise no, and this behaviour is as intended. It's not
>"percentage of document text matched" it's a measure of "how well your
>query matches this document".
>If all the query terms match the highest scoring document, we give it
>100%. If not all the terms match the highest scoring document, we give
>it a proportion of 100% based on the term weights
>And then we calculate percentage scores for all other documents based on
>this assigned percentage value.
>Your definition seems unhelpful to me - in most uses the query is quite
>a lot shorter than the document, and a 3 word query would score at most
>0.3% for a 1000 word document.
>> If I want to archive this, how to do indexing?
>You might be able to achieve something like what you describe at search
>time by writing your own weighting scheme and making get_sumpart()
>return 1/(unnormalised document length)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel