[Xapian-devel] GSoC 2012: Erlang Bindings

Olly Betts olly at survex.com
Mon Mar 19 00:09:19 GMT 2012


Hi Michael,

On Sun, Mar 18, 2012 at 08:50:32PM +0300, Michael Uvarov wrote:
> I have few questions and ideas about Xapian:
> 
> First of all, how does Xapian handle concurrent access to shared resources?
> How many connections can be established to one database for reading
> and writing?

One writer.  The number of readers is only limited by OS resource
limits.

The way the Btree update work is that readers can detect if a writer
has overwritten the snapshot they are working on.

> Who does coordinate parallel access to same database?

There is no daemon or other central authority.

Writer locking is done using fcntl, with the lock held by a child
process to protect us from some unhelpful semantics of fcntl.

> Are opening and closing a database expensive operations or not?

Opening to read is pretty cheap.  Opening to write a bit more expensive.
Closing is cheap, except for a writer it will implicitly flush any
pending changes (unless you're in a transaction).  It's the flush which
is the expensive operation though, not the actual closing.

> Is a minimal example of using Xapian with a real data set?

There's an example of indexing museum exhibits in the getting started
guide:

http://getting-started-with-xapian.readthedocs.org/

> Are there datasets for an efficiently testing?

There's a "perftest" which was aimed to test performance, but it uses
randomly synthesised data, so the results may not translate well to real
situations.

We don't have anything set up with real data.  It would certainly be
good to, and could certainly be part of the project if you wanted.

> I think it is a good idea for implementing this binding as a linked-in driver.
> A driver provides a queue and Xapian classes are not thread-safe.

I think "not thread-safe" is not a totally helpful way to think about it
as it tends to suggest more issues than there are.  In particular,
everything should be re-entrant (there's no global state).

It's concurrent accesses to the same C++ object which aren't guaranteed
to be supported.  But, for example, you can open the same database in
several different threads at the same time (as those are different C++
Database objects).

> So, it will
> be a mapping: one driver port -- one opening connection to a base.
> NIFs will require an implementing of a thread pool in C, but it is not
> a trivial
> task.
> Calls of NIFs can block a schedule of an virtual machine. It is also not good,
> because other green processes will wait.

I don't know enough about Erlang to comment on this, but Lenz may have
some thoughts.

Cheers,
    Olly



More information about the Xapian-devel mailing list