[Xapian-devel] GSoC xapian node binding thoughts

Olly Betts olly at survex.com
Wed May 30 03:24:08 BST 2012


On Tue, May 29, 2012 at 11:45:34AM -0700, Liam wrote:
> On Tue, May 29, 2012 at 9:03 AM, James Aylett <james-xapian at tartarus.org>wrote:
> > I'd say it's better than to refuse to compile, although it's somewhat moot
> > right now. All numbers will overflow eventually, although I assume in Node
> > yo'd just get IEEE rounding behaviour? Basically, there's no nice solution.
> 
> It would be a runtime check, not compile-time. We'd compile against a
> suitably configured Xapian :-)

If you change sizeof(Xapian::docid) (and/or the sizes of other types)
then that's an ABI change, so something built against xapian-core built
with one docid size simply won't work with xapian-core built with a
different docid size.

> In what context are int64 doc ids necessary? What % of installations use
> them?

They're obviously necessary if you have more than 4 billion documents.
You can also hit the limit sooner if you search several databases
together and the sizes are uneven (as the docids get interleaved).  They
are also handy if you have an external system with a numeric id which is
wider than 32 bits.

I doubt may people use them currently, quite possibly nobody does.  But
that's likely to change in the foreseeable future.  We're probably near
the point where you could conceivably build an index with this many
documents on commodity hardware.

I was really just trying to check that the issue had been considered, as
unnecessarily hard-wiring in an assumption that these quantities are
32 bit would be short-sighted.

> Seriously, lazy-loading is oversold from what I've seen. If you have data
> from real-world Xapian sites that shows a material advantage for it, I'd
> love to read...

Any site searching a large Xapian database is relying heavily on lazy
loading.

Cheers,
    Olly



More information about the Xapian-devel mailing list