Xapian InMemoryDatabase Concurrent Control Problem
olly at survex.com
Mon May 21 04:28:03 BST 2018
On Mon, May 21, 2018 at 02:41:31AM +0000, Miao LIU wrote:
> Sorry for troubling you this time. I am currently facing a challenge
> that I can not search and update Xapian "InMemoryDatabase"
> concurrently via 2 different threads although I have added the
> critical area mutex which allows only reading/writing at one single
> time. More specifically, along with the core dump, the error message
> was "double free or corruption (!prev): 0x000000000121f3b0" indicated
> a memory free problem was detected.
> Out of our expectations, the core dump happened when searching thread
> and writing thread accessed different docs respectively.
Different documents doesn't help - you are accessing the same underlying
database object, and you need to lock around any accesses to that
object, be they explicit or implicit:
| Be aware that some Xapian objects will keep internal references to
| others - for example, if you call xapian.Database.get_document(), the
| resulting xapian.Document object will keep a reference to the
| xapian.Database object, and so you can’t safely use the xapian.Database
| object in one thread at the same time as using the xapian.Document
| object in another.
By extrapolation, that's also true for two different Document objects
obtained from the same Database object.
Such concurrent access is outside the concurrency guarantees which
Xapian makes, so what you're seeing isn't a surprise. The documentation
even says you may get crashes or data corruption:
| If you really want to access the same Xapian object from multiple
| threads, then you need to ensure that it won’t ever be accessed
| concurrently (if you don’t ensure this bad things are likely to happen
| - for example crashes or even data corruption).
You should also be aware that the current inmemory backend isn't built
for performance or scalability. The next release series will hopefully
see a replacement which is, but for current versions using a disk-based
backend and a RAM disk is likely a better option anyway. And then you
can have one database with a separately opened Database object per
reader thread, plus a WritableDatabase object for the writer thread.
More information about the Xapian-discuss