[Xapian-tickets] [Xapian] #216: Inconsistent return values for percentage weights

Xapian nobody at xapian.org
Fri Jan 23 07:28:05 GMT 2009


#216: Inconsistent return values for percentage weights
---------------------+------------------------------------------------------
 Reporter:  richard  |        Owner:  olly     
     Type:  defect   |       Status:  assigned 
 Priority:  normal   |    Milestone:  1.0.11   
Component:  Matcher  |      Version:  SVN trunk
 Severity:  normal   |   Resolution:           
 Keywords:           |    Blockedby:           
 Platform:  All      |     Blocking:           
---------------------+------------------------------------------------------

Comment(by richard):

 Just to note that since revision [11822] (on trunk) we now throw an
 UnimplementedError if we're asked for a percentage cutoff and to sort
 primarily by value.

 I'm tending towards the feeling that percentages, calculated in the way we
 do, are more trouble (in terms of code complexity, and developer time)
 than they're worth.  We could change the calculation of percentages to be
 based on the maxweight value (and, with improvements in the statistics
 held, we should be able to start getting tighter bounds on maxweight), and
 remove a lot of special-case code in the matcher which handles changes in
 percentage cutoff weights.

 We could also provide an interface which returns the term weights for each
 of the terms in a query (generally useful).

 If users require a "precise" percentage calculated in the current way,
 they could get hold of the weight of the top document (either by asking
 for it to be included in the mset, if they're doing a relevance-sorted
 search, or by performing a separate search specifically for it), calculate
 the normalised percentage for it using the term weights (and
 get_matching_terms()), and then perform a search and calculate the
 percentages from that search.  Percentage cutoffs could also be done using
 weights in a similar manner.

 We could provide some helper classes/code to help users to implement this
 sort of scheme, but my feeling is that pulling it out of the matcher would
 be a big win.

 I don't think it would be unreasonable to experiment with this approach in
 the 1.1 release series.

-- 
Ticket URL: <http://trac.xapian.org/ticket/216#comment:17>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list