xapian-core and Windows non-ASCII paths
olly at survex.com
Mon Jun 8 23:30:01 BST 2020
On Thu, Jun 04, 2020 at 12:49:58PM +0200, Jean-Francois Dockes wrote:
> I am attaching a patch against the xapian-core 1.4 branch.
Patches need to go to git master first (unless they're only relevant to
1.4.x, which this clearly isn't).
> The idea of the patch is that a conversion to a Windows Unicode wide char
> string is attempted prior to performing a relevant system call. If the
> conversion succeeds, the wide version of the call is used, else, the
> previous narrow call is used. This should ensure that existing
> applications are undisturbed, and provides a way to tunnel a Unicode path
> by using utf-8.
I think this needs input from people with deeper knowledge of this
The approach of patching every affected call site doesn't really seem
workable to me - the maintenance and development overhead just seems too
high. We do need platform-specific code for some things, but no other
platform needs platform-specific code for something as pervasive as
working on a filename. We'll just end up fighting an ongoing battle
against newly introduced places that also need this special handling,
and because it works fine without for common uses such issues can too
easily go undetected for a long time (yours is the first report of this
problem, but it's always been there).
If it's really necessary to use these wide-character variants of
everything which takes a filename, I think the only way to sensibly
deal with that is to have a set of wrappers which present them as
the non-wide variants to the rest of the code - that way this at
least only needs addressing once per such function (though even that
is a maintenance pain as a patch making use of a currently-unwrapped C
library function taking a filename would require a new wrapper).
In terms of workarounds, simply changing directory to where the database
lives and then using a relative non-wide path should work.
More information about the Xapian-devel