[Xapian-devel] ICU

Olly Betts olly at survex.com
Tue Apr 11 02:55:08 BST 2006


On Mon, Apr 10, 2006 at 04:44:21AM +0100, Olly Betts wrote:
> By eschewing the standard idiom for wrapping multiline macro calls,
> they're forcing the risk of silent miscompilation on their users.

I've been thinking about this, and I think there's actually no silent
miscompilation risk here (I was thinking of the "dangling else" issue
but this is a rather different situation).

I think the extra semicolon can only stop code which looks like it
should work from compiling (and on the flip side code which looks like
it shouldn't compile can - e.g. if you accidentally omit the semicolon).

But I still think it's a bloody-minded attitude.  A macro which looks
like a function should work as one.

Michael Schlenker pointed me at the unicode handling code in Tcl.  It
looks very well done - the source file which provides utf8 handling and
unicode codepoint identification for the BMP (i.e. codepoints below
0x10000) compiles to a 28K object file (x86-64) which I think is all
the unicode support which the QueryParser class needs.  I think here
the evils of cut-and-paste code reuse are less than the annoyance of
adding a large library dependency to the core library.

For Omega we also need encoding conversion, which I think inevitably
needs a large bit of code or data.  Tcl's code for this is compact, but
has 1.3MB of data files.  I don't see so much an issue with adding a large
library dependency to omega, be it ICU, glib, using Tcl's code, or using
an installed version of Tcl.  Or something else.

What's a good option partly comes down to "what are people likely to
have installed anyway".  Looking at the debian "popcon" results, the
answer seems to be glib, then Tcl, then ICU.  But the spread isn't
great and ICU is pretty common (openoffice uses it I believe).  Not sure
how representative the numbers are though, and they may be rather
different for non-Linux platforms.

Cheers,
    Olly



More information about the Xapian-devel mailing list