[Xapian-discuss] Finding Max Possible Weight of a Document

Kenneth Loafman kenneth at loafman.com
Wed Feb 7 15:41:58 GMT 2007


Kenneth Loafman wrote:
> Olly Betts wrote:
>> On Fri, Jan 26, 2007 at 06:57:37AM -0600, Kenneth Loafman wrote:
>>> Is there a way, without running a match, to find the max possible 
>>> weight of a document?  This could be with or without consideration of 
>>> the length of the document.  I have looked at all of the docs 
>>> available on the web and installed on the system and may just be 
>>> overlooking it.
>>
>> Are you trying to find the max possible weight of a particular document,
>> or of any document in the database?
> 
> Max weight of each document relative to the corpus.
> 
>> If it's any document in the database, you can call Enquire::get_mset()
>> with maxitems = 0 and get_max_possible() on the resulting MSet will give
>> you an upper bound (in this case, no actual matching happens).
> 
> I did not know that would be valid without a previous match.  Thanks!

If you call get_mset(0,0) without a previous query, it returns 0.  With 
a previous query, it returns a value, dependent upon the weight of the 
max document in each case.

Using the samplesearch.py, I created getdocweights.py to show the point 
(attached).  When run against a database containing various articles, I 
get the following results:

./getdocweights.py /home/xapian/articles reuters washington
Number documents: 233567
Getting MSet, no query
Max possible no-query: 0.000000
Performing query `Xapian::Query(reuter)'
Max possible with-query: 0.911843
Performing query `Xapian::Query(washington)'
Max possible with-query: 2.666136

Any ideas how to proceed from here?  Do I need to roll my own, or is 
there a procedure I could make public that would do it?

...Thanks,
...Ken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: getdocweights.py
Type: text/x-python
Size: 943 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20070207/c6ee6723/getdocweights.py


More information about the Xapian-discuss mailing list