[Xapian-devel] Problems with /bin/cat and flintlock?
space.ship.traveller at gmail.com
Thu Apr 7 15:56:22 BST 2011
I'm working on some integration project with Ruby, Rack, Apache, Phusion Passenger and Xapian.
I've been having intermittent issues with the flintlock code - it seems that the function FlintLock::lock is never returning and this is locking up the Ruby process.
My guess is that Xapian is locking up in a system call and Ruby can't schedule its green threads.
I've done some basic debugging with strace and noticed the following:
29944 30022 29942 29939 ? -1 Sl 33 0:09 | | \_ Passenger ApplicationSpawner: /srv/www/www.oriontransfer.co.nz
30022 30041 29942 29939 ? -1 S 33 0:00 | | | \_ /bin/cat
[Using the following source code as a reference http://xapian.org/docs/sourcedoc/html/flint__lock_8cc_source.html]
At this point, using strace I found that the application process seemed to be stuck in on
00219 ssize_t n = read(fds, &ch, 1);
Obviously child process was cat, nothing really interesting about that.
After I killed cat, then the process was freed up and the web application started responding again.
Well, I don't know why this is unreliable I've briefly looked at the code and noticed a few things:
00172 // Connect pipe to stdin and stdout.
00173 dup2(fds, 0);
00174 dup2(fds, 1);
Isn't this setting stdin and stdout to the same end of an existing pipe? Does this make sense?
Anyway, I thought I'd mention this because it is a consistent problem. If there is anything you think I should do with strace, gdb, etc on the processes next time it hangs, let me know.
One option to fix the bug without really understanding the real issue would be to use select in the parent thread, rather than read. Then, use a timeout of a few seconds so that if the child doesn't acquire the lock within x seconds, it is as good as failed.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel