[Xapian-discuss] minor problem (file locking)
Olly Betts
olly at survex.com
Fri Jan 11 00:26:06 GMT 2008
On Thu, Jan 10, 2008 at 11:24:48AM -0500, Deron Meranda wrote:
> Continuing a slightly off-topic dicussion of Xapian's locking
> mechanism...
We could move it to xapian-devel.
> Oh, I hadn't considered that you'd be opening multiple file
> descriptors for the same file, but instead assumed they were
> being reused by an intermediate "pooling" layer.
No, we don't have such a layer. If we had a "server daemon" design,
such a layer would make a lot of sense. I'm not sure it really does
when each process is working by itself - most people won't want to open
a database more than once within a process so it's generally not going
to save much on open file handle. And if pread isn't available, we'd
have to put thread locks around every call to lseek()+read() if the
filehandles are shared within a process (or else there's a race
condition).
> > ... which is very unhelpful if threads are
> > involved. This could potentially be solved using a process-global
> > map to track locks, but then you need a mutex to protect this, and
> > it introduces an O(n.log(n)) behaviour in the number of
> > WritableDatabases open, which seems less than ideal.
>
> This should be easily sub-linear.
I guess you must mean sub-linear per open, since opening n databases
in less than O(n) would be a clever trick. I'm talking about the work
to open n databases.
The per-database work should be sub-linear, but that's not good compared
to O(1) which it is at the moment (assuming that the libc and OS don't
introduce overheads dependent on the number of handles open). Using
hash_map (non-standard, but provided by many STL implementations, and
probably standardised in the next C++ standard revision) is O(1) on
average, but the worst case is O(n).
There's another issue here too - we can't just build a mapping based
on pathnames, because symlinks (and other things like Linux's "mount
--bind") mean that multiple paths can map to the same database, so
perhaps you need to use something like the device number and inode.
Except some Unix-like systems don't have inodes, and I suspect some Unix
FSes for foreign formats don't either, so even this is more complex than
it seems.
> But I suspect it doesn't really matter that much because of the
> frequency and the typical small sizes of 'n'.
Yes, though that's less true if we use locking to allow read-only
databases to mark the revision they're using as "to be kept" (which is
the current plan for eliminating the annoying DatabaseModifiedError).
Having to fork a locking process isn't ideal there either though.
> > It also means
> > we need to add thread-specific code to the library which caused a lot
> > of pain in the early days. Perhaps pthreads was just too immature then
> > though.
>
> I'm not sure of the timeline of the Xapian history. But yes at one
> point POSIX compliance with pthreads used to be a real hit or miss.
> Today though it seems that most Unixes and Linuxes actually have
> pretty solid pthreads (my biggest annoyance is the inconsistent support
> of pselect()). However I don't do any development on Macs or Windows,
> so I don't know how well they behave.
I assume Windows must have good thread support, since the overhead of
starting a process there is so high that you're forced to use threads
in many cases, and in fact we already use threads on Windows for
xapian-tcpsrv. It's actually irrelevant for flint locking though, since
Windows actually has a thread-friendly API for file locking, so we just
use that.
> > I previously found a post on lkml (which I can't seem to relocate now)
> > where someone queried this behaviour and was told it was as specified by
> > POSIX, so it seems fcntl() is just broken by design (or more kindly, it
> > was probably designed before threads were an issue).
>
> Yes, POSIX defines this stuff. It's not so much that it's broken,
> you just have to understand at what level it works; and that's
> not at the thread level nor at the descriptor level. And yes, that
> can be very surprising, and argulably useless for some cases.
I understand that the lock is (at least conceptually) held on some
structure shared between file descriptors on the same file within
the process. But I think that's simply the wrong level to have defined
it at for a general purpose locking API.
However, I realise that this is unlikely to get fixed any time soon
(and it will take a while for any fix to be available widely too).
> It's not the only thing that could be called broken by design
> in POSIX, but then I remember the pre-POSIX days so I'm
> not going to complain too much. Most things in POSIX are good.
I'd agree with that summary. I'm not meaning to run-down POSIX. I just
wish they'd defined a locking API which worked for our "use case".
[SELinux]
> > Would this have been less of an issue if we had our own helper binary in
> > place of /bin/cat? Or would exec() of anything have been denied?
>
> It's hard to say, it all depends on the SELinux rules one has. And
> if I remember correctly it still worked, since the Xapian code anticipates
> a failure of the exec call and doesn't just abort.
Ah yes - if we can't exec /bin/cat, we just emulate it. That works, but
means we're carrying around more of a VM overhead from pages which we
acquired from the fork() and which the parent process has now modified.
That's the only reason we exec /bin/cat.
> But seeing a lot of prevented /bin/cat execs in your security logs or
> ps listings is quite suspicious until you've figured out why.
And seeing exec failures for something like this would presumalby be a
lot less suspicious (and more self-explanatory):
/usr/lib/xapian/bin/flint-lock-holder
> > Can you write a paragraph describing what's needed that can go in a
> > suitable place in the documentation?
>
> I'll help where I can, but I want to be confident first so I
> don't spread misinformation.
Just something which explained that the exec of /bin/cat is OK and how
to tell SELinux that would be great. I've never looked at SELinux so
I only have a rough idea what it does.
Cheers,
Olly
More information about the Xapian-discuss
mailing list