[Xapian-discuss] Moving to 1.0.x

Olly Betts olly at survex.com
Mon Oct 8 20:52:48 BST 2007


On Mon, Oct 08, 2007 at 02:47:26PM -0400, Mike Boone wrote:
> I have had a site happily running Xapian 0.8.x through the PHP/SWIG
> bindings for a few years.

Which 0.8.x?  It would be useful to know...

> I am moving the site to new hardware and
> figured it would be a good time to upgrade Xapian to 1.0.x. There seem
> to be plenty of changes. I believe I have correctly updated my PHP
> code to the more OO-like syntax. But I have some questions:
> 
> * My Xapian job that updated the database always looked for db_lock to
> determine if another write job was using the database. That file seems
> to be gone now, and "flintlock", a possible replacement, is always
> present, writing or not. What's the proper check now?

The documented way would be to try to create a WritableDatabase and see
if it fails or not.  This has the advantage of avoiding a possible race
condition (if the lock file doesn't exist, but another process opens the
database between you checking and trying to open it.

With PHP4 you can use an error handler to catch the exception, which is
ugly, but does the job.  PHP5 supports exceptions so there you get an
exception.

Under Unix, you could try to get an fcntl() exclusive lock on the lock
file, but I can't guarantee that will work in future versions.

> * Indexing consumes a lot of memory, which constantly grows as I add
> documents. I do two indexes, one of 215,000 small documents and
> another of 130,000 larger documents. Indexing the entire set of one or
> the other results in a process that eats hundreds of megabytes.
> Perhaps these are all being cached and flushed at the very end? What
> is the default flush threshold and how can I adjust it via PHP? Is
> there anything else I can do to keep usage reasonable? The box has
> 2GB.

By default changes are flushed after 10000 documents, which you can
change by setting XAPIAN_FLUSH_THRESHOLD in the environment (in 1.0.2
and earlier, the value was read and cached when the first
WritableDatabase was opened, but as of 1.0.3 we now read it each time).

The threshold has been 10000 since 0.8.2, but prior to 0.9.7
replace_document() was double counted though, so if you only did
replace_document(), it would flush every 5000 documents.  In 0.8.1
and earlier, it was 1000.

It's hard to say why you see what you're seeing - I wouldn't expect
it.  The large number of versions you've jumped over makes it harder
to speculate - perhaps it would be useful to try 0.9.6 (the last
version to support the non-OO PHP4 bindings).

> * This question might be hard to answer without seeing code, but so
> far in testing, the searches on 1.0.3 are running somewhat slower than
> the old 0.8.x searches. And the 1.0.3 searches are running on faster
> hardware with no user load yet.

It is hard to say, but in general it should be faster (and the feedback
I've had backs that up).  Testing versions between the one you were
using and 1.0.3 might help to narrow things down.  It would also be
interesting to compare with 0.8.x on the same hardware (perhaps the
faster hardware isn't for some reason!)

> Are there any gotchas that I might
> have overlooked when converting my PHP code from the 0.8.x to the
> current version? I basically converted each Xapian-related statement
> from the 0.8.x syntax to a 1.0.3 equivalent. Perhaps there are
> features I overlooked in doing that.

Nothing comes to mind.  Did you use the conversion script?  If not, it
might be interesting to compare what it gives with your conversion:

http://www.oligarchy.co.uk/xapian/patches/xapian-php-update

Cheers,
    Olly



More information about the Xapian-discuss mailing list