[Xapian-discuss] Re: Xapian and research in IR: a few suggestions from experience

Deron Meranda deron.meranda at gmail.com
Wed Sep 12 19:57:13 BST 2007


On 9/12/07, Olly Betts <olly at survex.com> wrote:
> On Wed, Sep 05, 2007 at 06:45:01PM +0200, Emmanuel Eckard wrote:
> > All these models would call for doubles, or vectors of doubles, to be
> > associated with Documents, TermIterators and Databases.
>
> I wonder how best to store such doubles.  One option is the format we
> use for the remote protocol (serialise_double() in
> common/serialise-double.cc) which can require up to 11 bytes, but
> often needs less than 8.  It's possible that the encoding could be
> made more compact - an extra byte or two isn't a big concern for where
> it is currently used.
>
> Another approach would be to use the IEEE format, and carefully convert
> to/from that on platforms where it isn't the native format.
>
> If you have a sample of the doubles you'd want to store, it would be
> interesting to see how large it is after running it through
> serialise_double().

If the doubles are only used to store probabilities and thus all
the doubles are in the range 0.0 <= x <= 1.0, then you may be
able to re-use integers and interpret them as fixed-point
rather than floating-point.

Even a 32-bit integer gets you somewhere around 9+ digits of
precision as long as you're in the 0 to 1 range.

That may be a space/time tradeoff though, unless you could
implement all the algorithms to use integer math rather than
floating point.
-- 
Deron Meranda



More information about the Xapian-discuss mailing list