[Xapian-tickets] [Xapian] #385: Expanding docids (etc) beyond 32 bit types

Xapian nobody at xapian.org
Wed Jun 24 12:29:23 BST 2015


#385: Expanding docids (etc) beyond 32 bit types
-------------------------+------------------------------
 Reporter:  james        |             Owner:  olly
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  1.3.x
Component:  Other        |           Version:  SVN trunk
 Severity:  minor        |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------

Comment (by olly):

 [14b7af012cd57bb5e6097584f36d8680ca3c8d7e] just changes the testutils
 `operator<<` to use `Xapian::docid` instead of `unsigned int`, which
 avoids using a template there.  Really the issue is just that the wrong
 type was being used.

 [ decode_length() ]
 > I'm thinking this is no longer needed because of the other change you
 made?

 Indeed - that should all just work now.

 [ sizeof(long long) ]
 > I'm not really sure how to find out the answer to this question. Do we
 have a list of devices/hardware/OSs or something to check against?

 It's hard to gather a complete list of platforms Xapian works on as people
 might have successfully built on something and never told us.  We're much
 more likely to hear if they fail, or if a few tweaks are needed.

 But we can look at the platforms which are generally in active use (and in
 this case, anything old isn't going to make `long long` wider than it has
 to, and 64-bit is the minimum requirement).

 The current norm is definitely 64-bit long long - e.g. all the
 architectures Debian supports have `sizeof(long long) == 8`.  The GCC
 manual seems to hint that GCC supports platforms where this isn't true,
 but I don't know an easy way to find out what they are:

 https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html#g_t_005f_005fint128
 says "There is no support in GCC for expressing an integer constant of
 type __int128 for targets with long long integer less than 128 bits wide"
 which suggests that there are targets with long long at least 128 bits
 wide.

 C++11 provides `uint64_t` in `<cstdint>` (at least if such a type exists),
 though so far we've tried to avoid introducing C++11 assumptions in the
 API headers (only in the library code) - most compilers currently need
 C++11 support enabling with a command-line option, and it seems unhelpful
 to force all C++ projects using Xapian to update their build systems to
 probe for such an option.

 I think we probably just use `unsigned long long` - it will always work,
 and while it may be wider than necessary, that seems mostly a theoretical
 worry currently.

 > Would the conditional enabling that seems half done through #define
 USE_64BIT_DOCID and #define USE_64BIT_TERMCOUNT be suitable?

 We don't want to have be defining generically named macros in the API
 headers (we risk colliding with macros the application using Xapian is
 using) - so the macros should start `XAPIAN_`.

 But the basic idea seems OK.

 > Could we have those somehow enabled in ./configure perhaps ./configure
 --with-64-bit-docids --with-64bit-termcount? Is there an example of a
 configuration step in xapian already that I can look at and try to copy
 that?

 This is trickier for things like this which we want to use in the API
 headers as we can't just stick `#include <config.h>` in those.

 I'd look at `--enable-backend-chert` and `XAPIAN_HAS_CHERT_BACKEND` in
 `configure.ac` and `include/xapian/version_h.cc` (which is used to
 generate `include/xapian/version.h`).

 The options should probably be `--enable-X` (`--with-X` is conventionally
 meant to be used when `X` is some other software package and `--enable-X`
 when `X` is a feature of this package - e.g. `--with-java` vs `--enable-
 backend-chert` - the most obvious consequence is the sections they are
 listed under by `configure --help`).

 It seems confusing for `docids` to be plural in the option name when
 `termcount` isn't; similarly be consistent with `64-bit` vs `64bit` there.

 > Also is there a way to detect this at compile time so we can pivot based
 on whether or not unsigned long is already 64bit?

 I can't think of one unless we force people to select C++11 and just use
 `uint64_t`, but it seems a bit soon for that.  At some point compilers
 will presumably default to C++11 and this won't be a consideration.

--
Ticket URL: <http://trac.xapian.org/ticket/385#comment:13>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list