[Xapian-discuss] minor problem

Michael A. Lewis MAL at ICGINC.COM
Mon Dec 24 04:31:17 GMT 2007


Thanks for the response Olly. My indexing code appears below. A note about the speed. It was this slow (at least to the naked eye) even when there were only a couple of hundred documents. After this code, the child process which contains this code just exits.
 
try {                        
        Xapian::WritableDatabase database(dbname, DB_CREATE_OR_OPEN);                        
        Xapian::TermGenerator indexer;                        
        Xapian::Stem stemmer("english");                        
        indexer.set_stemmer(stemmer);                        
        Xapian::Document doc;                        
        doc.set_data(line);                        
        indexer.set_document(doc);                        
        indexer.index_text(line);                        
        if (meta) {                        
                doc.set_data(metatext);                        
        }                        
        docid=database.add_document(doc);                        
        sprintf( tmp1, "%lu", docid );                        
        x = write( c_id, tmp1, strlen(tmp1) );                        
        if ( x != strlen(tmp1) ) {                        
            log_it( "ERROR: insert could not write to socket" );                        
        }                        
}

 
________________________________

From: Olly Betts [mailto:olly at survex.com]
Sent: Sun 12/23/2007 10:58 PM
To: Michael A. Lewis
Cc: xapian-discuss at lists.xapian.org
Subject: Re: [Xapian-discuss] minor problem



On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis wrote:
> When I do a "ps -ef" command from the command line I see a task
> belonging to my daemon that shows the command being run as "/bin/cat".
> Looking in the xapian source code I have found that to be in the flint
> backend locking code.

The semantics of fcntl() locking within a process are rather unhelpful,
so we fork a child process to take and hold the lock for us.  To
minimise VM use, we just exec /bin/cat once the lock is obtained.

> Since I am serializing my updates (one after another) and only from a
> single process, why am I seeing what appears to be long-term locks?

The lock is held (and so the /bin/cat child process exists) for as long
as you have the WritableDatabase open.  So unless you're closing and
reopening the database for each addition (which generally is probably
not a good idea) then this sounds like what I'd expect.

> This index code ran very fast in pre-1.0 versions of the indexer. I
> upgraded to 1.0.0, then 1.0.1, etc. But I didn't need to index until
> recently.

It's hard to know what's going on from the information given.  You said
you're using TermGenerator, which is new in 1.0.0, so that may be
indexing significantly differently to whatever you were using before.
Though several seconds per document for a 10,000 document database
really is excessively slow anyway.

Could you show us what the indexing code looks like?

Cheers,
    Olly




More information about the Xapian-discuss mailing list