[Xapian-devel] Searching without flush?

Olly Betts olly at survex.com
Tue Jun 1 12:51:42 BST 2004


On Tue, Jun 01, 2004 at 12:06:47PM +0200, Robert Pollak wrote:
> I am using the Xapian-0.8.0 snapshot from 15-Apr-2004 02:14, and I am 
> using the same Xapian::WritableDatabase instance for indexing and searching.
> 
> Currently each search causes a database flush, which is slow.
> How can I avoid this flush?

I think the first question is what are you searching for?

There are two things which a search does which will cause a flush.  The
first is opening posting lists for the terms in the search.  If any of
the search terms was in a document added, removed, or modified since
the last flush, quartz will flush.

The other is calculating the average document length.

It might be possible to avoid the search entirely - for example, if you
just want to see if there's a document with a certain UID term, you can
look at the postlist for that term, rather than running a full blown
search.  Then you'll only cause a flush if you try to update a document
added since the last flush.  This is how omindex and scriptindex work.

If you really need to do a search, a boolean search would avoid the need
to calculate the average document length, so will avoid flushing except
when you search for a term used in a recent change.

If you need a probabilistic search, it shouldn't be hard to adjust the
average length to account for buffered changes without forcing a flush.
But you'd still force a flush when you search for a term used in a
recent change.

> It seems that I have to modify Xapian to either
> - search only the already flushed data (eventually missing some hits)

This is easy to do - just open the database read-only (i.e. as a
Xapian::Database).  Whenever you explicitly flush or get a
Xapian::DatabaseModifiedError, call reopen() on the read-only database.

> or
> - search the un-flushed data, too.

If you need searching of unflushed data without forcing a flush when you
hit a term used in a recent change, you need to generate modified
posting lists on the fly.  This is certainly possible, but it's rather
fiddly.

Cheers,
    Olly




More information about the Xapian-devel mailing list