[Xapian-discuss] Finding Max Possible Weight of a Document
Kenneth Loafman
kenneth at loafman.com
Wed Feb 7 15:41:58 GMT 2007
Kenneth Loafman wrote:
> Olly Betts wrote:
>> On Fri, Jan 26, 2007 at 06:57:37AM -0600, Kenneth Loafman wrote:
>>> Is there a way, without running a match, to find the max possible
>>> weight of a document? This could be with or without consideration of
>>> the length of the document. I have looked at all of the docs
>>> available on the web and installed on the system and may just be
>>> overlooking it.
>>
>> Are you trying to find the max possible weight of a particular document,
>> or of any document in the database?
>
> Max weight of each document relative to the corpus.
>
>> If it's any document in the database, you can call Enquire::get_mset()
>> with maxitems = 0 and get_max_possible() on the resulting MSet will give
>> you an upper bound (in this case, no actual matching happens).
>
> I did not know that would be valid without a previous match. Thanks!
If you call get_mset(0,0) without a previous query, it returns 0. With
a previous query, it returns a value, dependent upon the weight of the
max document in each case.
Using the samplesearch.py, I created getdocweights.py to show the point
(attached). When run against a database containing various articles, I
get the following results:
./getdocweights.py /home/xapian/articles reuters washington
Number documents: 233567
Getting MSet, no query
Max possible no-query: 0.000000
Performing query `Xapian::Query(reuter)'
Max possible with-query: 0.911843
Performing query `Xapian::Query(washington)'
Max possible with-query: 2.666136
Any ideas how to proceed from here? Do I need to roll my own, or is
there a procedure I could make public that would do it?
...Thanks,
...Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getdocweights.py
Type: text/x-python
Size: 943 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20070207/c6ee6723/getdocweights.py
More information about the Xapian-discuss
mailing list