[Xapian-tickets] [Xapian] #216: Inconsistent return values for percentage weights

Xapian nobody at xapian.org
Wed Jul 16 02:45:53 BST 2008


#216: Inconsistent return values for percentage weights
---------------------+------------------------------------------------------
 Reporter:  richard  |        Owner:  olly    
     Type:  defect   |       Status:  assigned
 Priority:  normal   |    Milestone:  1.0.8   
Component:  Matcher  |      Version:  SVN HEAD
 Severity:  normal   |   Resolution:          
 Keywords:           |    Blockedby:          
 Platform:  All      |     Blocking:          
---------------------+------------------------------------------------------
Changes (by olly):

  * status:  new => assigned


Old description:

> When results are being sorted by a value, the percentage values for the
> results
> returned are normalised based on the document in the portion of the mset
> requested which has the highest weight, instead of the document matching
> the
> query which has the highest weight.  I have a testcase demonstrating this
> which
> I will attach shortly.
>
> This is because, in multimatch.cc, we calculate "best" by looking for the
> highest weighted document in the candidate mset, but when sorting by
> anything
> other than relevance, the highest weighted document may have been
> discarded already.
>
> It is hard to see how to fix this - one obvious approach would be to
> check every
> candidate document's weight before discarding it during the match
> process, and
> keep track the docid of the document with the highest weight seen so far.
> However, we currently don't calculate the weight for all the documents we
> see
> (because we first check the document against the lowest document in the
> mset
> using mcmp), so this would force us to calculate the weights on documents
> we
> wouldn't otherwise need to calculate it for.  Since the percentages
> aren't
> necessarily even wanted, this seems a shame.
>
> Perhaps a reasonable approach would be to add a flag on enquire which
> governed
> whether percentages were wanted or not; it would then be more reasonable
> to go
> to extra effort to keep track of the highest weighted document if the
> percentages were actually desired.

New description:

 When results are being sorted primarily by an order other than relevance
 (e.g. {{{sort_by_value()}}}), the percentage values returned by the MSet
 object may be incorrect because they are
 calculated based on the document in the portion of the MSet
 requested which has the highest weight, instead of the document matching
 the
 query which has the highest weight.

 This issue has existed in all previous Xapian releases, as far as we can
 tell.

 There is currently no fix in progress, since it is probably not possible
 to fix without significant loss of efficiency, which would
 adversely affect users who aren't interested in the percentage scores.

 If you really need percentage scores in this situation, one workaround
 would be to first run the search using relevance order, asking for only
 the top document, and to remember the weight and percentage assigned to
 that document. Then, re-run the search in sorted order, and calculate the
 percentages yourself from the weights assigned to the results, using this
 information.

 A testcase demonstrating this is attached to this ticket.

 The issue is that in multimatch.cc, we calculate "best" by looking for the
 highest weighted document in the candidate mset, but when sorting by
 anything
 other than relevance, the highest weighted document may have been
 discarded already.

 It is hard to see how to fix this - one obvious approach would be to check
 every
 candidate document's weight before discarding it during the match process,
 and
 keep track the docid of the document with the highest weight seen so far.
 However, we currently don't calculate the weight for all the documents we
 see
 (because we first check the document against the lowest document in the
 mset
 using mcmp), so this would force us to calculate the weights on documents
 we
 wouldn't otherwise need to calculate it for.  Since the percentages aren't
 necessarily even wanted, this seems a shame.

 Perhaps a reasonable approach would be to add a flag on enquire which
 governed
 whether percentages were wanted or not; it would then be more reasonable
 to go
 to extra effort to keep track of the highest weighted document if the
 percentages were actually desired.

--

Comment:

 Merge the ReleaseNotes entry from 1.0.7 into the description to try to
 keep information about this issue in one place.

-- 
Ticket URL: <http://trac.xapian.org/ticket/216#comment:11>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list