[Xapian-tickets] [Xapian] #724: Handle overflow of collection frequency

Xapian nobody at xapian.org
Fri Mar 24 03:21:53 GMT 2023

#724: Handle overflow of collection frequency
 Reporter:  Olly Betts     |             Owner:  Olly Betts
     Type:  defect         |            Status:  new
 Priority:  normal         |         Milestone:  2.0.0
Component:  Backend-Glass  |           Version:  git master
 Severity:  normal         |        Resolution:
 Keywords:                 |        Blocked By:
 Blocking:                 |  Operating System:  All
Changes (by Olly Betts):

 * milestone:  1.4.x => 2.0.0


 This seems tricky to fix properly.

 There are two problems here really:

 One is that cf is the sum of wdf over all the documents, but the same type
 as wdf so can simply overflow.  That probably needs to be thrown as an
 exception to the application code, though there's not a lot if can really
 do about it.  The other option is to use a type for collection frequency
 which is wider (>= width(doccount)+width(termcount) would work).  We
 already have `Xapian::totallength` which is always 64-bit which would be
 suitable unless `--enable-64bit-docid` and/or `--enable-64bit-termcount`
 is used.

 The other is that these deltas are the signed versions of the types, so we
 can overflow them more easily (and we can potentially overflow `tf_delta`,
 though we'd need to batch up >= 2^31^ document changes which isn't really
 feasible in practice).  We can probably use the overflow checking addition
 and subtraction and flush the pending changes if we would overflow, though
 this is fiddly to get right, and handling a wdf >= 2^31^ is fiddlier still
 since we need to apply it directly rather than via the inverter.  We could
 use a wider type here, or track the signs separately to the magnitudes,
 but both increase the size of the data stored, which is a concern.
Ticket URL: <https://trac.xapian.org/ticket/724#comment:2>
Xapian <https://xapian.org/>

More information about the Xapian-tickets mailing list