Thanks very much for your reply!<br>

<p class="MsoNormal"><span lang="EN-US">For a documents, it has its own term list.<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">That will be some terms.<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">So, how to calculate the term weight for

these terms.<o:p><br></o:p>For example:<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">D1-&gt;term1, term2, term3<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">I want to get the term weight of term1, and

term2 and term3.<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">I noticed that there is function “calc_termweight”.

But it’s a private function.<o:p><br>&nbsp;</o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">Thanks,<o:p></o:p></span></p>

<p class="MsoNormal"><span lang="EN-US">Ming<o:p></o:p></span></p>

<br><div>&nbsp;</div><div>&nbsp;</div><div></div><br><pre>在2008-11-15，"Olly Betts" &lt;olly@survex.com&gt; 写道：

&gt;On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote:

&gt;&gt; Only ??0 001a heathrow taxis?? can have 100% matching.

&gt;&gt; 

&gt;&gt; Shorter or longer query, should less than 100% matching, right?

&gt;

&gt;A longer query would since (unless you repeat terms) it must have

&gt;words which aren't in the document.

&gt;

&gt;But otherwise no, and this behaviour is as intended.  It's not

&gt;"percentage of document text matched" it's a measure of "how well your

&gt;query matches this document".

&gt;

&gt;If all the query terms match the highest scoring document, we give it

&gt;100%.  If not all the terms match the highest scoring document, we give

&gt;it a proportion of 100% based on the term weights

&gt;

&gt;And then we calculate percentage scores for all other documents based on

&gt;this assigned percentage value.

&gt;

&gt;Your definition seems unhelpful to me - in most uses the query is quite

&gt;a lot shorter than the document, and a 3 word query would score at most

&gt;0.3% for a 1000 word document.

&gt;

&gt;&gt; If I want to archive this, how to do indexing?

&gt;

&gt;You might be able to achieve something like what you describe at search

&gt;time by writing your own weighting scheme and making get_sumpart()

&gt;return 1/(unnormalised document length)

&gt;

&gt;Cheers,

&gt;    Olly

</pre><br><!-- footer --><br><hr/>

<a href="http://www.yeah.net">网易邮箱10周年,技术见证辉煌</a>