[Xapian-devel] Xapian Indexer problem.

liminghit liminghit at 126.com
Tue Nov 18 13:51:56 GMT 2008


Thanks very much for your reply!


For a documents, it has its own term list.

That will be some terms.

So, how to calculate the term weight for these terms.
For example:

D1->term1, term2, term3

I want to get the term weight of term1, and term2 and term3.

I noticed that there is function “calc_termweight”. But it’s a private function.
 

Thanks,

Ming



 
 


在2008-11-15,"Olly Betts" <olly at survex.com> 写道:
>On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote:
>> Only ??0 001a heathrow taxis?? can have 100% matching.
>> 
>> Shorter or longer query, should less than 100% matching, right?
>
>A longer query would since (unless you repeat terms) it must have
>words which aren't in the document.
>
>But otherwise no, and this behaviour is as intended.  It's not
>"percentage of document text matched" it's a measure of "how well your
>query matches this document".
>
>If all the query terms match the highest scoring document, we give it
>100%.  If not all the terms match the highest scoring document, we give
>it a proportion of 100% based on the term weights
>
>And then we calculate percentage scores for all other documents based on
>this assigned percentage value.
>
>Your definition seems unhelpful to me - in most uses the query is quite
>a lot shorter than the document, and a 3 word query would score at most
>0.3% for a 1000 word document.
>
>> If I want to archive this, how to do indexing?
>
>You might be able to achieve something like what you describe at search
>time by writing your own weighting scheme and making get_sumpart()
>return 1/(unnormalised document length)
>
>Cheers,
>    Olly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20081118/b1a5bb8d/attachment.htm 


More information about the Xapian-devel mailing list