[Xapian-tickets] [Xapian] #385: Expanding docids (etc) beyond 32 bit types

Xapian nobody at xapian.org
Tue Jun 23 23:56:21 BST 2015


#385: Expanding docids (etc) beyond 32 bit types
-------------------------+------------------------------
 Reporter:  james        |             Owner:  olly
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  1.3.x
Component:  Other        |           Version:  SVN trunk
 Severity:  minor        |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------

Comment (by dylang):

 Replying to [comment:8 olly]:
 > I think the Java bindings should just continue to use Java `long` for
 docid after this change.  That's a '''signed''' 64 bit type, but the
 changes to `SmokeTest.java` in the patch look like an incompatible change
 which require people to write rather clumsy and verbose code.
 >
 > That may just be as simple as adding something like this to
 `java/java.i`:
 >
 > {{{
 > %apply long long { Xapian::docid };
 > }}}
 >
 > Or maybe `unsigned` instead of `long long` there (currently we end up
 using Java `long`). And similarly for other types like `Xapian::doccount`.

 Agreed. I'll give that a go and see if I can get the smoke tests to run
 without changes to `SmokeTest.java`.


 > The change to `decode_length()` is OK for a 64-bit platform, but I think
 it'll fail to work where `size_t` is 32-bit.  I guess we either want it to
 return `unsigned long long`, or a separate version of the function for
 cases where the length can legitimately be > 32 bit.  This is actually
 already buggy for replication (#678).

 I'm thinking this is no longer needed because of the other change you
 made?

 > Is `long long` ever > 64 bits currently?  C++11 says it must be at least
 that size, but if there's a platform relevant to us where `long` is 64 bit
 and `long long` is 128, we probably want to use `unsigned long` (though
 using an unnecessarily wide type is probably only a performance issues).

 I'm not really sure how to find out the answer to this question. Do we
 have a list of devices/hardware/OSs or something to check against? Also is
 there a way to detect this at compile time so we can pivot based on
 whether or not `unsigned long` is already 64bit?

 > It would also be good to have a simple test that this actually works -
 e.g. a multi database with subdatabases which each have a document with
 local docid `0xffffffff`, so the docid in the merged database should
 require 64 bits.  It can't run for inmemory (as that reserves
 O(last_docid) space), but for other backends it ought to work.

 Sounds good. I'll have a go at adding such a test.

 > We also need to think about how to enable this.  I'm thinking probably a
 configure option, off by default for 1.4.x, unless benchmarking on 32 bit
 platforms show that this doesn't incur an overhead.  It could perhaps be
 conditionally enabled for 64 bit platforms.

 Would the conditional enabling that seems half done through `#define
 USE_64BIT_DOCID` and `#define USE_64BIT_TERMCOUNT` be suitable? Could we
 have those somehow enabled in `./configure` perhaps `./configure --with-64
 -bit-docids --with-64bit-termcount`? Is there an example of a
 configuration step in xapian already that I can look at and try to copy
 that?

--
Ticket URL: <http://trac.xapian.org/ticket/385#comment:12>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list