[Xapian-discuss] Xapian::Document and threads

Olly Betts olly at survex.com
Mon May 5 12:15:35 BST 2014


On Mon, May 05, 2014 at 08:12:15AM +0200, Jean-Francois Dockes wrote:
> Olly Betts writes:
>  > On Sun, May 04, 2014 at 08:16:50PM +0200, Jean-Francois Dockes wrote:
>  > > While investigating very infrequent crashes in the Recoll indexer, I have
>  > > come to a very basic question: is it safe to pass a copy of a
>  > > Xapian::Document from thread to thread (multiple threads queue documents,
>  > > other thread updates the index) ?
>  > > 
>  > > I don't seem to get directly into trouble while doing this, but I don't see
>  > > anything either in the RefCntr implementation which would explicitely make
>  > > it thread-safe, so I am wondering. Maybe I'm just missing the obvious.
>  > 
>  > http://getting-started-with-xapian.readthedocs.org/en/latest/concepts/concurrency.html
> 
> This only warns about documents read from the index and sharing Database
> references. Mine are just created from external data.

That part says "for example".

These parts apply to your situation though:

    "Xapian doesn’t maintain any global state, so you can safely use
    Xapian in a multi-threaded program provided you don’t share objects
    between threads"

And:

    "If you really want to access the same Xapian object from multiple
    threads, then you need to ensure that it won’t ever be accessed
    concurrently (if you don’t ensure this bad things are likely to
    happen - for example crashes or even data corruption). One way to
    prevent concurrent access is to require that a thread gets an
    exclusive lock on a mutex while the access is made."

> Only mentionning the Xapian::Database reference problem in the document
> almost makes things worse as one is led to believe that it is the main or
> only issue.

These implicit references are an important thing for the user to be
aware of, so really need to be mentioned.

> Not my case anyway, as I think that the code is older than the
> document.

Xapian has behaved this way since 0.5.0, which was released more than a
decade ago.  The document I linked to was indeed written more recently,
but it just attempted to document the existing situation with threads in
a place where new API users would see it.

> I am not requesting thread-safe counters, and I understand that faking copy
> semantics may be useful, but I think that unprotected reference counters
> should be warned about. 

I'm not really sure what you're hoping the text should say here.  Are
you thinking that calling the copy constructor, assignment operator, etc
don't count as accessing the object?  We could certainly easy clarify
that.

> This is specially true because there is nothing that the library user can
> do to mitigate the problem. Data accesses can be protected if you want to
> share these objects, but it would be difficult with the counters, and the
> only possibility is to use pointers, even in seemingly trivial cases.

You just need a mutex which has to be held before a thread can affect
the reference count.  The simplest would be a "Xapian" mutex which a
thread has to hold before it can do anything with any Xapian object.

> This is a documentation issue, please, add a word in the header comments:
> only pointers to Xapian::Document (and others) can hop threads.

I don't think that really expresses the restrictions accurately.

For example, if you have a std::list<Xapian::Document> structure and
protect access to it with a mutex, you should be able to safely append
new objects to it from the creating threads and process objects already
in the list from your indexing threads.  No explicit pointers to
Xapian::Document are involved, though some may be behind the scenes.

And if you only have a single static global Xapian::Document object,
then you can add terms to it in one thread and then add it to a database
in another thread - no pointers to Xapian::Document are involved at all.

> I'm not sure if the base.h header comments speculating about the atomicity
> of pointer copying are a fine example of English humour of if they were
> placed to further confuse the investigator :)

Those comments date back to the pre-0.5.0 days - they're just very out
of date, and I've just removed them.

Cheers,
    Olly



More information about the Xapian-discuss mailing list