[Xapian-tickets] [Xapian] #804: Improve clustering API
Xapian
nobody at xapian.org
Wed May 29 05:16:34 BST 2024
#804: Improve clustering API
--------------------------+-------------------------------
Reporter: James Aylett | Owner: Olly Betts
Type: enhancement | Status: assigned
Priority: highest | Milestone: 1.5.0
Component: Library API | Version: git master
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
--------------------------+-------------------------------
Comment (by Olly Betts):
It looks to me like diversification has an ordering bug.
`Diversify::Internal::compute_diff_dmset()` finds the documents which
weren't promoted by diversification and returns them and they're added to
the returned `DocumentSet` after the promoted documents. However it does
this by iterating over `points` which is an `unordered_set` so the
iteration order is arbitrary, and this iteration order determines the
order in which these documents are added to and thus appear in the
`DocumentSet`.
The sole testcase we seem to have for diversification only check the
process completes and returns a non-empty `DocumentSet` so completely
misses this problem.
I haven't checked the paper yet, but if there's no specified reordering of
non-promoted documents I think we should preserve the relative order from
the original ranking.
--
Ticket URL: <https://trac.xapian.org/ticket/804#comment:4>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list