[Xapian-discuss] Finding Max Possible Weight of a Document

Olly Betts olly at survex.com
Fri Feb 9 05:02:18 GMT 2007


On Wed, Feb 07, 2007 at 09:41:58AM -0600, Kenneth Loafman wrote:
> Kenneth Loafman wrote:
> >Olly Betts wrote:
> >>If it's any document in the database, you can call Enquire::get_mset()
> >>with maxitems = 0 and get_max_possible() on the resulting MSet will give
> >>you an upper bound (in this case, no actual matching happens).
> >
> >I did not know that would be valid without a previous match.  Thanks!
> 
> If you call get_mset(0,0) without a previous query, it returns 0.

Well, the empty query matches no documents, so the maximum possible
weight of a document *is* 0!

> >>Are you trying to find the max possible weight of a particular document,
> >>or of any document in the database?
> >
> >Max weight of each document relative to the corpus.

In Xapian, a document only has a weight in the context of a query, and
these weights are intended to allow relative comparisons within a set of
results.  It isn't intended to be meaningful to compare document weights
from different queries.

I understand roughly what you're trying to achieve, but in order to find
the "max weight of each document relative to the corpus", it seems you'd
have to turn the corpus into a query - i.e. OR together all the terms
in it.  That's going to be a bit unmanageable for anything big, but you
can probably use OP_ELITE_SET instead of OP_OR to pick out a sane number
of the terms which "matter".

> Any ideas how to proceed from here?  Do I need to roll my own, or is 
> there a procedure I could make public that would do it?

Other than the above approach, I think you'll have to roll your own.  I
don't think there's really any private code inside the library which
does what you want.

Cheers,
    Olly



More information about the Xapian-discuss mailing list