[Xapian-discuss] minor problem (file locking)
Deron Meranda
deron.meranda at gmail.com
Thu Jan 10 16:24:48 GMT 2008
Continuing a slightly off-topic dicussion of Xapian's locking mechanism...
On Jan 10, 2008 9:47 AM, Olly Betts <olly at survex.com> wrote:
> On Thu, Jan 10, 2008 at 01:09:14AM -0500, Deron Meranda wrote:
> > On Dec 23, 2007 10:58 PM, Olly Betts <olly at survex.com> wrote:
> > > The semantics of fcntl() locking within a process are rather unhelpful,
> > > so we fork a child process to take and hold the lock for us. To
> > > minimise VM use, we just exec /bin/cat once the lock is obtained.
> >
> > Olly, I've been curious about this; what kind of troublesome fcntl
> > semantics were you running into that necessitated this child
> > lock-holding process? This locking style is rather unusual to me.
>
> There are two problems. Quoting from the Linux man page:
[fcntl(2) F_SETLK ...]
> Attempting to lock the same file again from the process which already
> holds the lock will succeed, [...]
Oh, I hadn't considered that you'd be opening multiple file
descriptors for the same file, but instead assumed they were
being reused by an intermediate "pooling" layer.
Yes, reopening the same file can cause you serious grief with fcntl.
> ... which is very unhelpful if threads are
> involved. This could potentially be solved using a process-global
> map to track locks, but then you need a mutex to protect this, and
> it introduces an O(n.log(n)) behaviour in the number of
> WritableDatabases open, which seems less than ideal.
This should be easily sub-linear. But I suspect it doesn't really
matter that much because of the frequency and the typical
small sizes of 'n'.
> It also means
> we need to add thread-specific code to the library which caused a lot
> of pain in the early days. Perhaps pthreads was just too immature then
> though.
I'm not sure of the timeline of the Xapian history. But yes at one
point POSIX compliance with pthreads used to be a real hit or miss.
Today though it seems that most Unixes and Linuxes actually have
pretty solid pthreads (my biggest annoyance is the inconsistent support
of pselect()). However I don't do any development on Macs or Windows,
so I don't know how well they behave.
> I previously found a post on lkml (which I can't seem to relocate now)
> where someone queried this behaviour and was told it was as specified by
> POSIX, so it seems fcntl() is just broken by design (or more kindly, it
> was probably designed before threads were an issue).
Yes, POSIX defines this stuff. It's not so much that it's broken,
you just have to understand at what level it works; and that's
not at the thread level nor at the descriptor level. And yes, that
can be very surprising, and argulably useless for some cases.
It's not the only thing that could be called broken by design
in POSIX, but then I remember the pre-POSIX days so I'm
not going to complain too much. Most things in POSIX are good.
> As well as being removed by an explicit F_UNLCK, record locks are auto-
> matically released when the process terminates or if it closes any file
> descriptor referring to a file on which locks are held.
Yes, that's the other more-challenging problem that fcntl can cause
when you repoen the same file in the same process. First close
always wins.
> Argghhh! It also means
> that user code can smash locks just by opening and closing the lock file -
> that may seem something that would never happen, but consider an indexer
> which traverses the filesystem (like omindex) which was accidentally set
> to index a tree including its own database directory...
If somebody did that, it would be a big problem! As a library,
there's only so much you can do to protect against stupid
things the library user does. But I can see where doing that
in an indexing application could be a pretty easy-to-make
mistake.
> I don't know why these problems don't seem to be more widely known.
> It's less of an issue in an application than a library, since you have
> more control over the process. Other than that, all I can assume is
> that people either haven't noticed the problem and have flawed locking,
> or that they don't use fcntl() locking.
Today, fcntl locking is usually pretty foolproof and usually the best
method of locking available, as long as your process doesn't reopen
the same file multiple times. I suspect most applications never do
that and so don't have to deal with all it's semantics. But I can see
why Xapian is different now.
> Suggestions for a better locking approach are certainly welcome.
If something inspirational comes to me I'll share. But now that
you've explained Xapian, the child process locker seems to be
a relatively simple solution. The exec of /bin/cat was surprising
though, but I can see your reasoning for doing that too.
> I've already considered the obvious ones: lockf() isn't a solution as on
> Linux it's just a wrapper for fcntl(), and flock() locks between
> processes, according to the Linux man page.
lockf is sometimes just a user-space wrapper around fcntl, but
it's semantics aren't nearly as well defined. And if it isn't a
wrapper, it's often worse. fcntl is superior and lockf should
be considered deprecated.
> Using the existence of a lockfile (created in an NFS-safe way as
> described in the O_EXCL section of Linux's "man 2 open") is what we did
> for quartz but leaves stale locks behind if the process is killed (you
> can store host and pid in the lock file but it's hard to recover without
> avoiding obscure race conditions, and that doesn't help if the database
> might be on NFS or similar as you can't tell if a process is still
> active on another host).
This type of locking is possible to do correctly, but I can tell
you it is deceptively hard, and I've rarely seen it done right.
So it is sufficient for things like boot scripts and such, but usually
not for real concurrency control.
> > Also, I first stumbled into this accidentally when trying to run
> > under Linux with some rather tight SELinux security policies
> > in place...the exec of /bin/cat was failing because of denied
> > permissions that I had no idea that the library required.
> > I assume most people won't try using it in such an environment though.
>
> Would this have been less of an issue if we had our own helper binary in
> place of /bin/cat? Or would exec() of anything have been denied?
It's hard to say, it all depends on the SELinux rules one has. And
if I remember correctly it still worked, since the Xapian code anticipates
a failure of the exec call and doesn't just abort. But seeing a lot
of prevented /bin/cat execs in your security logs or ps listings is quite
suspicious until you've figured out why.
> Can you write a paragraph describing what's needed that can go in a
> suitable place in the documentation?
I'll help where I can, but I want to be confident first so I
don't spread misinformation.
--
Deron Meranda
More information about the Xapian-discuss
mailing list