[Xapian-devel] Problems with /bin/cat and flintlock?

Olly Betts olly at survex.com
Fri Apr 8 03:41:56 BST 2011


On Fri, Apr 08, 2011 at 02:56:22AM +1200, Samuel Williams wrote:
> I've been having intermittent issues with the flintlock code - it
> seems that the function FlintLock::lock is never returning and this is
> locking up the Ruby process.

What OS is this on?  That's likely to be highly relevant.

> At this point, using strace I found that the application process
> seemed to be stuck in on
> 00219         ssize_t n = read(fds[0], &ch, 1);
> 
> Obviously child process was cat, nothing really interesting about that.

The child process should send a single character before it execs
/bin/cat, which is what the parent is waiting to read there.

If the write() call in the child fails, then the child exits, so
unless the OS fails to transfer the byte across the pipe, I struggle
to see how we can end up in this situation.

> 00172         // Connect pipe to stdin and stdout.
> 00173         dup2(fds[1], 0);
> 00174         dup2(fds[1], 1);
> 
> Isn't this setting stdin and stdout to the same end of an existing
> pipe? Does this make sense?

It's a bidirectional socket, so that's fine.

> Anyway, I thought I'd mention this because it is a consistent problem.
> If there is anything you think I should do with strace, gdb, etc on
> the processes next time it hangs, let me know.

It would be useful to attach gdb to the parent and child and do a
backtrace in each (bt) to see exactly where we are.

> One option to fix the bug without really understanding the real issue
> would be to use select in the parent thread, rather than read. Then,
> use a timeout of a few seconds so that if the child doesn't acquire
> the lock within x seconds, it is as good as failed.

I'd prefer to understand the issue rather than paper over it.  Locking
is rather a critical operation to get right!

Also, it's rather unclear what a suitable threshold is - you can use
fcntl locking over NFS if you run the lock daemon, so a few seconds to
get a lock is probably not impossible with a busy NFS server.

Cheers,
    Olly



More information about the Xapian-devel mailing list