Thanks very much for your reply!<br>
<p class="MsoNormal"><span lang="EN-US">For a documents, it has its own term list.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">That will be some terms.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">So, how to calculate the term weight for
these terms.<o:p><br></o:p>For example:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">D1->term1, term2, term3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I want to get the term weight of term1, and
term2 and term3.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I noticed that there is function “calc_termweight”.
But it’s a private function.<o:p><br> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Ming<o:p></o:p></span></p>
<br><div> </div><div> </div><div></div><br><pre>在2008-11-15,"Olly Betts" <olly@survex.com> 写道:
>On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote:
>> Only ??0 001a heathrow taxis?? can have 100% matching.
>>
>> Shorter or longer query, should less than 100% matching, right?
>
>A longer query would since (unless you repeat terms) it must have
>words which aren't in the document.
>
>But otherwise no, and this behaviour is as intended. It's not
>"percentage of document text matched" it's a measure of "how well your
>query matches this document".
>
>If all the query terms match the highest scoring document, we give it
>100%. If not all the terms match the highest scoring document, we give
>it a proportion of 100% based on the term weights
>
>And then we calculate percentage scores for all other documents based on
>this assigned percentage value.
>
>Your definition seems unhelpful to me - in most uses the query is quite
>a lot shorter than the document, and a 3 word query would score at most
>0.3% for a 1000 word document.
>
>> If I want to archive this, how to do indexing?
>
>You might be able to achieve something like what you describe at search
>time by writing your own weighting scheme and making get_sumpart()
>return 1/(unnormalised document length)
>
>Cheers,
> Olly
</pre><br><!-- footer --><br><hr/>
<a href="http://www.yeah.net">网易邮箱10周年,技术见证辉煌</a>