[Xapian-tickets] [Xapian] #724: Handle overflow of collection frequency
Xapian
nobody at xapian.org
Fri Mar 24 03:21:53 GMT 2023
#724: Handle overflow of collection frequency
---------------------------+-------------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: defect | Status: new
Priority: normal | Milestone: 2.0.0
Component: Backend-Glass | Version: git master
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
---------------------------+-------------------------------
Changes (by Olly Betts):
* milestone: 1.4.x => 2.0.0
Comment:
This seems tricky to fix properly.
There are two problems here really:
One is that cf is the sum of wdf over all the documents, but the same type
as wdf so can simply overflow. That probably needs to be thrown as an
exception to the application code, though there's not a lot if can really
do about it. The other option is to use a type for collection frequency
which is wider (>= width(doccount)+width(termcount) would work). We
already have `Xapian::totallength` which is always 64-bit which would be
suitable unless `--enable-64bit-docid` and/or `--enable-64bit-termcount`
is used.
The other is that these deltas are the signed versions of the types, so we
can overflow them more easily (and we can potentially overflow `tf_delta`,
though we'd need to batch up >= 2^31^ document changes which isn't really
feasible in practice). We can probably use the overflow checking addition
and subtraction and flush the pending changes if we would overflow, though
this is fiddly to get right, and handling a wdf >= 2^31^ is fiddlier still
since we need to apply it directly rather than via the inverter. We could
use a wider type here, or track the signs separately to the magnitudes,
but both increase the size of the data stored, which is a concern.
--
Ticket URL: <https://trac.xapian.org/ticket/724#comment:2>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list