[Xapian-tickets] [Xapian] #804: Improve clustering API

Xapian nobody at xapian.org
Wed May 29 05:16:34 BST 2024


#804: Improve clustering API
--------------------------+-------------------------------
 Reporter:  James Aylett  |             Owner:  Olly Betts
     Type:  enhancement   |            Status:  assigned
 Priority:  highest       |         Milestone:  1.5.0
Component:  Library API   |           Version:  git master
 Severity:  normal        |        Resolution:
 Keywords:                |        Blocked By:
 Blocking:                |  Operating System:  All
--------------------------+-------------------------------
Comment (by Olly Betts):

 It looks to me like diversification has an ordering bug.

 `Diversify::Internal::compute_diff_dmset()` finds the documents which
 weren't promoted by diversification and returns them and they're added to
 the returned `DocumentSet` after the promoted documents.  However it does
 this by iterating over `points` which is an `unordered_set` so the
 iteration order is arbitrary, and this iteration order determines the
 order in which these documents are added to and thus appear in the
 `DocumentSet`.

 The sole testcase we seem to have for diversification only check the
 process completes and returns a non-empty `DocumentSet` so completely
 misses this problem.

 I haven't checked the paper yet, but if there's no specified reordering of
 non-promoted documents I think we should preserve the relative order from
 the original ranking.
-- 
Ticket URL: <https://trac.xapian.org/ticket/804#comment:4>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list