Database left unlocked by Tcl bindings
Olly Betts
olly at survex.com
Wed Mar 2 03:55:23 GMT 2016
On Tue, Mar 01, 2016 at 07:02:03PM +0100, Eric J wrote:
> After some more experiments, and some help from the Tcl side, I can now
> say that database locks from the Tcl bindings will not function
> correctly in the following Tcl versions:
>
> 8.5.18 built with threads (not the default)
> 8.6.[1-4] built with threads (default)
>
> but will function correctly in the following Tcl versions:
>
> 8.5.18 built without threads (default)
> 8.5.19 built with or without threads
> 8.6.[1-4] built without threads (not the default)
> 8.6.5 built with or without threads
>
> Earlier 8.5.x are presumably the same as 8.5.18.
>
> This all seems (just my own theory, not proven) to be a collision of
> corner cases:
>
> * fork + exec being expected to need to preserve a file lock.
> * early creation of a notifier thread expected to be without undesirable
> side-effects
Looking at the code in tcl 8.5, I notice the notifier thread calls
pipe() and then later sets FD_CLOEXEC on the two fds (where supported
pipe2(fds, O_CLOEXEC) would achieve that atomically):
http://sources.debian.net/src/tcl8.5/8.5.18-3/unix/tclUnixNotfy.c/#L1090
I'm not seeing exactly how, but I wonder if this interacts badly with
Xapian closing all unwanted fds in the child process, resulting in Tcl's
thread ends up setting FD_CLOEXEC on the lock file fd.
There seem to have been a number of fixes and fixes to those fixes to
this file in the last year, so it's hard to quickly see what's changed
and why, so I'm not sure why 8.6.5 works better, or if it's just that
the problem doesn't manifest as reliably there:
http://core.tcl.tk/tcl/finfo?name=unix/tclUnixNotfy.c
> Anyway, the answer is to use Tcl versions as above, or to use
> Xapian/kernel combinations where OFD locks are available.
OFD locks are a good answer for Linux, but sadly POSIX don't seem to be
steaming ahead with standardising them, and I don't know of any other
platforms offering them as an extension.
We can't fix existing releases of Xapian (or Tcl) but a way to stop
this happening going forwards would be good.
For Tcl the simple fix would be just to document "if Tcl is build with
threads, you need to use Tcl >= 8.6.5" (assuming that 8.6.5 actually
fixes this).
But being robust to arbitrary pthread_atfork() handlers doing unhelpful
stuff would be good too.
I don't see a way to kill off any other threads in the child process -
Linux has pthread_kill_other_threads_np() but it only does anything
for LinuxThreads, it's a no-op for NPTL. And it's mostly other
platforms we're concerned about anyway.
All I can really think of is replacing /bin/cat with a custom helper and
so take the lock after exec(). That adds extra overhead to failed
locking attempts, but that's not such a big deal, especially if OFD
locks get standardised since then it only affects older platforms.
Anyone got any better ideas?
Cheers,
Olly
More information about the Xapian-discuss
mailing list