[Xapian-discuss] Node.js binding

Liam xapian at networkimprov.net
Thu Oct 20 21:17:45 BST 2011

On Thu, Oct 20, 2011 at 12:06 PM, Richard Boulton <richard at tartarus.org>wrote:

> On 20 October 2011 18:11, Liam <xapian at networkimprov.net> wrote:
> >> >  MSetIterator:: operator *(), get_percent(), get_document()
> >> get_document() is not safe here - the documents can be lazily loaded
> >> into the MSet object, so this can hit disk or network.
> >
> > Document::get_data() does I/O, so what does MSetIterator::get_document()
> do?
> Depends (on the backend, and possibly on other things); there's lazy
> loading going on here.  Also, note that Document::get_data() only
> returns the "user data" part of the document; it may or may not
> (again, depending on the backend) result in the rest of the data
> associated with a document (ie, terms and values) being read.  It's
> all quite complicated ;)

A mature Node binding would have both sync and async variants of the API,
and let the user choose based on his knowledge of the deployment context.

>> One concern about putting some Xapian accesses into a subthread; it is
> >> not safe to call methods on Xapian API objects concurrently, so you'll
> >> need to protect calls with some locking scheme, or some convention to
> >> avoid this.  Seems very tricky to do right, to me, and might therefore
> >> be safer to just do everything in a subthread.
> >
> > While it is possible to "parallelize" I/O functions as below, typically
> you
> > sequence them in nested callbacks as in my prior example code. All
> > Javascript code is confined to the main thread -- which makes it possible
> to
> > hang everything with while(true) {} :-P
> What I was concerned about was concurrent calls to methods of Xapian
> objects, which this doesn't avoid.  For example, if the main thread
> has a "db" variable pointing to a Xapian database, and starts a
> get_mset() operation, the get_mset() operation will be performed using
> the Xapian database in a subthread.  From what you describe, there's
> nothing stopping the main thread kicking off another get_mset()
> operation (or any other operation which would access the database)
> before the subthread finishes, which would cause problems.

Concurrent calls to the same method of *different* objects cause problems??

Of course, the programmer could just be warned not to do that, but in
> such a setup it seems very likely that accidental violation of that
> would happen (after all, most node programmers won't be expecting to
> have to worry about avoid threading issues; that's kind-of the point
> of node as far as I understand it).

Node developers already have to deal with the implications of async I/O.
Simultaneous writes to a file can cause problems. Node avoids
multi-threading gotchas only in JS code. So yes, a Node developer would know
to use separate Database instances, or serialize access to a single one.

More information about the Xapian-discuss mailing list