[Xapian-tickets] [Xapian] #385: Expanding docids (etc) beyond 32 bit types
Xapian
nobody at xapian.org
Tue Jun 23 23:56:21 BST 2015
#385: Expanding docids (etc) beyond 32 bit types
-------------------------+------------------------------
Reporter: james | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.3.x
Component: Other | Version: SVN trunk
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+------------------------------
Comment (by dylang):
Replying to [comment:8 olly]:
> I think the Java bindings should just continue to use Java `long` for
docid after this change. That's a '''signed''' 64 bit type, but the
changes to `SmokeTest.java` in the patch look like an incompatible change
which require people to write rather clumsy and verbose code.
>
> That may just be as simple as adding something like this to
`java/java.i`:
>
> {{{
> %apply long long { Xapian::docid };
> }}}
>
> Or maybe `unsigned` instead of `long long` there (currently we end up
using Java `long`). And similarly for other types like `Xapian::doccount`.
Agreed. I'll give that a go and see if I can get the smoke tests to run
without changes to `SmokeTest.java`.
> The change to `decode_length()` is OK for a 64-bit platform, but I think
it'll fail to work where `size_t` is 32-bit. I guess we either want it to
return `unsigned long long`, or a separate version of the function for
cases where the length can legitimately be > 32 bit. This is actually
already buggy for replication (#678).
I'm thinking this is no longer needed because of the other change you
made?
> Is `long long` ever > 64 bits currently? C++11 says it must be at least
that size, but if there's a platform relevant to us where `long` is 64 bit
and `long long` is 128, we probably want to use `unsigned long` (though
using an unnecessarily wide type is probably only a performance issues).
I'm not really sure how to find out the answer to this question. Do we
have a list of devices/hardware/OSs or something to check against? Also is
there a way to detect this at compile time so we can pivot based on
whether or not `unsigned long` is already 64bit?
> It would also be good to have a simple test that this actually works -
e.g. a multi database with subdatabases which each have a document with
local docid `0xffffffff`, so the docid in the merged database should
require 64 bits. It can't run for inmemory (as that reserves
O(last_docid) space), but for other backends it ought to work.
Sounds good. I'll have a go at adding such a test.
> We also need to think about how to enable this. I'm thinking probably a
configure option, off by default for 1.4.x, unless benchmarking on 32 bit
platforms show that this doesn't incur an overhead. It could perhaps be
conditionally enabled for 64 bit platforms.
Would the conditional enabling that seems half done through `#define
USE_64BIT_DOCID` and `#define USE_64BIT_TERMCOUNT` be suitable? Could we
have those somehow enabled in `./configure` perhaps `./configure --with-64
-bit-docids --with-64bit-termcount`? Is there an example of a
configuration step in xapian already that I can look at and try to copy
that?
--
Ticket URL: <http://trac.xapian.org/ticket/385#comment:12>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list