[Xapian-tickets] [Xapian] #216: Inconsistent return values for percentage weights
Xapian
nobody at xapian.org
Fri Jan 23 07:28:05 GMT 2009
#216: Inconsistent return values for percentage weights
---------------------+------------------------------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.0.11
Component: Matcher | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
---------------------+------------------------------------------------------
Comment(by richard):
Just to note that since revision [11822] (on trunk) we now throw an
UnimplementedError if we're asked for a percentage cutoff and to sort
primarily by value.
I'm tending towards the feeling that percentages, calculated in the way we
do, are more trouble (in terms of code complexity, and developer time)
than they're worth. We could change the calculation of percentages to be
based on the maxweight value (and, with improvements in the statistics
held, we should be able to start getting tighter bounds on maxweight), and
remove a lot of special-case code in the matcher which handles changes in
percentage cutoff weights.
We could also provide an interface which returns the term weights for each
of the terms in a query (generally useful).
If users require a "precise" percentage calculated in the current way,
they could get hold of the weight of the top document (either by asking
for it to be included in the mset, if they're doing a relevance-sorted
search, or by performing a separate search specifically for it), calculate
the normalised percentage for it using the term weights (and
get_matching_terms()), and then perform a search and calculate the
percentages from that search. Percentage cutoffs could also be done using
weights in a similar manner.
We could provide some helper classes/code to help users to implement this
sort of scheme, but my feeling is that pulling it out of the matcher would
be a big win.
I don't think it would be unreasonable to experiment with this approach in
the 1.1 release series.
--
Ticket URL: <http://trac.xapian.org/ticket/216#comment:17>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list