[Xapian-discuss] Question on "single writer, multiple reader"

Gang Chen pkuchengang at gmail.com
Thu Jan 22 03:45:37 GMT 2015


Hi, J, Olly,

    Thanks for the replies!

    I've been using 'reopen()' in my search process, and as expected, the
new documents can be retrieved now!

    As I dived deeper with Xapian, I found another problem using the *slot*
feature with "single writer and multiple readers". After several days'
trial and error, I think it might be a bug with the Chert backend.

    So, here is my observation:

    I used Xapian-1.2.19, and the default Chert backend.

    I wanted to index some movie meta data, e.g. the title, and the
premiere time. I stored them in a document with the 'doc.add_value()'
method.

        newdocument.add_posting("title_0", 1, 1);
        newdocument.add_posting("time_0", 1, 1);
        newdocument.add_value(1, "title_0");              // slot1, title_0
        newdocument.add_value(2, "time_0");             // slot2, time_0

    In the search process, I used 'doc.get_value()' to get the value in
slots.

        for (Xapian::MSetIterator i = matches.begin(); i != matches.end();
++i) {
                Xapian::Document doc = i.get_document();
                cout << "Document ID " << *i << "\t" << i.get_percent() <<
"%" << endl;
                cout << "[" << doc.get_value(1) << "]" << endl;
                cout << "[" << doc.get_value(2) << "]" << endl;
        }

    While search process was alive, I added some more movie data into the
database. The first few new ones were fine, but when there were more than
1,000 (or more) documents (committed every 10,000 docs) added to the
database, the search process crashed with a seg fault. However, I restarted
the search process, and the new documents could be retrieved. Btw, in the
search process, 'reopen()' was performed before each query.

    I tried changing from 'add_value()' and 'get_value()' to 'set_data()'
and 'get_data()', the searching and incremental indexing were all
successful. The 'data' value and the 'slot' value were both attached to a
document, but different behaviors were observed. So I guess there might be
something wrong with the slot value?

    I also tried to explicitly use the *Flint* as the backend. Surprsingly,
there was no seg fault, and everything was successful.

    Could it be something wrong with the slot value processing in the Chert
backend?


Best wishes,
Gang


2015-01-20 9:43 GMT+08:00 Olly Betts <olly at survex.com>:

> On Sun, Jan 18, 2015 at 04:25:29PM +0000, James Aylett wrote:
> > That’s exactly how it’s supposed to work. “Eventually” (once the
> > writer gets sufficiently far ahead of the reader), the reader will get
> > a DatabaseModifiedError and will have to re-open the database, but
> > until then it’s up to it when it does so. You may wish to do it every
> > N requests, or every K seconds, or only when you have to handle
> > DatabaseModifiedError; it’s up to you.
> >
> > We have a note that some more detailed documentation around this would
> > be helpful. For now, the following should be useful:
> > <
> https://getting-started-with-xapian.readthedocs.org/en/latest/concepts/indexing/databases.html?highlight=databasemodifiederror#concurrent-access
> >.
>
> I've just improved this with a note that reopen() is a cheap no-op when
> there isn't a newer revision:
>
>
> https://github.com/jaylett/xapian-docsprint/commit/41bb7a1da61d22e0047a83176386da4db1ee9f15
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list