[Xapian-discuss] flush problem

Michael A. Lewis MAL at ICGINC.COM
Sun Jan 20 20:30:04 GMT 2008


From: xapian-discuss-bounces at lists.xapian.org on behalf of James Aylett
Sent: Sun 1/20/2008 3:12 PM
To: xapian-discuss at lists.xapian.org
Subject: Re: [Xapian-discuss] flush problem

The main thing the system seems to be doing is IOWAIT (17-68% is the approx. range I'm seeing over a few minutes).  The only appilcation running on this system is the XAPIAN code. At the time of the insert, no other processing is searching or inserting. Basically, only the flush code is running. The getdatabase code is as follows:

static vector<string> dbNames;          
static vector<string> dbErrors;        
static map<char*, Xapian::WritableDatabase*> dbHash;
//      Gets a database handle or creates it if necessary
Xapian::WritableDatabase* getDatabase () {
        map<char*, Xapian::WritableDatabase*>::iterator iter;
        iter = dbHash.find(dbname);
        if (iter != dbHash.end()) {
                return iter->second;
        else {     
                dbHash[dbname] = new Xapian::WritableDatabase (dbname, DB_CREATE_OR_OPEN);
                return dbHash[dbname];

It's pretty simple. The average document length is about 300k of standard english text. Nothing remarkable or esoteric with the exception of a number of email addresses. In my previous posting I sent the output from a top command while it is flushing (which it currently is doing). Appears to be using 2.8gb of memory with 1.1gb free.

On Sun, Jan 20, 2008 at 01:32:19PM -0500, Michael A. Lewis wrote:

> I am having a problem with flushing a database. I am adding N
> records to the DB (which amounts to 1 - 2000). At then end of the
> run, I issue a flush() call. The problem is that the flush call
> never seems to do anything. Every 10000 additions to the database
> and the library performs a flush (which can take up to 3 hours on a
> 560,000 document database) as if my flush call was never performed.
> 1) This seems entirely too long, is it?

Sounds high to me, but it depends on so many factors: number of terms,
size of document data, available memory, how much memory is used by
Xapian to hold the 10k documents before flushing, logical to physical
volume layout, file systems involved...

What are you seeing as the main activity during flush? If you're on a
Unix machine it'll probably be one of system, user or iowait.

> 2) Why would my flush be ignored (no tranactions being used, just
> straight add using the term generator).
> This is my flush code:
> try {
>       Xapian::WritableDatabase* database = getDatabase();
>       database->flush();
> } catch (const Xapian::Error & err) {
>       s="ERROR:"+err.get_msg();
>       log_it(s.c_str());
>       write(c_id,"ERROR:-3",8);
> }
> return;

Assuming that getDatabase() implements the Singleton pattern
correctly, that you aren't clearing its instance, and that you aren't
using threading (or if you are you know what you're doing with
Singleton), this is odd.

I've had a quick look over the flint code, and I can't see how it
could not be working for you. If you compile with --enable-log and
then run with XAPIAN_DEBUG_LOG set to a file, and XAPIAN_DEBUG_FLAGS
set to -1, you'll get (lots!) of messages. You particularly should get
an apply call from flint after your flush; if you don't, it's not
working for some reason.

Everything will be slower with debugging on, of course.


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

Xapian-discuss mailing list
Xapian-discuss at lists.xapian.org

More information about the Xapian-discuss mailing list