[Xapian-discuss] Moving to 1.0.x

Mike Boone boonedocks at gmail.com
Mon Oct 8 21:29:52 BST 2007


On 10/8/07, Olly Betts <olly at survex.com> wrote:
> On Mon, Oct 08, 2007 at 02:47:26PM -0400, Mike Boone wrote:
> > I have had a site happily running Xapian 0.8.x through the PHP/SWIG
> > bindings for a few years.
>
> Which 0.8.x?  It would be useful to know...

I had to look it up. The old machine is on 0.8.5 using Red Hat
Enterprise 2.1 and the infamous GCC 2.96.

> > * My Xapian job that updated the database always looked for db_lock to
> > determine if another write job was using the database. That file seems
> > to be gone now, and "flintlock", a possible replacement, is always
> > present, writing or not. What's the proper check now?
>
> The documented way would be to try to create a WritableDatabase and see
> if it fails or not.  This has the advantage of avoiding a possible race
> condition (if the lock file doesn't exist, but another process opens the
> database between you checking and trying to open it.
>
> With PHP4 you can use an error handler to catch the exception, which is
> ugly, but does the job.  PHP5 supports exceptions so there you get an
> exception.

The new box is running Red Hat Enterprise 4, for which the default
install of PHP is a Red-Hat-patched version 4.3.9. I'm hoping to leave
it alone so I can let Red Hat worry about the security updates. If I
find something that would really work better under PHP5, I might just
switch.

> By default changes are flushed after 10000 documents, which you can
> change by setting XAPIAN_FLUSH_THRESHOLD in the environment (in 1.0.2
> and earlier, the value was read and cached when the first
> WritableDatabase was opened, but as of 1.0.3 we now read it each time).
>
> The threshold has been 10000 since 0.8.2, but prior to 0.9.7
> replace_document() was double counted though, so if you only did
> replace_document(), it would flush every 5000 documents.  In 0.8.1
> and earlier, it was 1000.

So it looks like flushing isn't doing anything to give back memory, as
my indexer would be running flush around 23 times. The memory usage
always is up, at least by my output which shows memory usage for every
1,000 documents. I leave the WritableDatabase object open the whole
time; I have not tried to close it and reopen it after X documents, or
limit the process to X documents and then restart it for the next set
of X.

> It's hard to say why you see what you're seeing - I wouldn't expect
> it.  The large number of versions you've jumped over makes it harder
> to speculate - perhaps it would be useful to try 0.9.6 (the last
> version to support the non-OO PHP4 bindings).

I might just try the original code with 0.9.6 and see what happens.

> > * This question might be hard to answer without seeing code, but so
> > far in testing, the searches on 1.0.3 are running somewhat slower than
> > the old 0.8.x searches. And the 1.0.3 searches are running on faster
> > hardware with no user load yet.
>
> It is hard to say, but in general it should be faster (and the feedback
> I've had backs that up).  Testing versions between the one you were
> using and 1.0.3 might help to narrow things down.  It would also be
> interesting to compare with 0.8.x on the same hardware (perhaps the
> faster hardware isn't for some reason!)

Are there any benchmarking tools that would give me an idea of the
speed of just Xapian on a system? My standard search runs via PHP and
once Xapian returns results, the content of the matching documents is
pulled from MySQL. So there are a lot of variables involved in my
estimate of the search speed. It might be MySQL or PHP or Apache that
need tuning, so it would be nice to eliminate Xapian from the
troubleshooting.

> > Are there any gotchas that I might
> > have overlooked when converting my PHP code from the 0.8.x to the
> > current version? I basically converted each Xapian-related statement
> > from the 0.8.x syntax to a 1.0.3 equivalent. Perhaps there are
> > features I overlooked in doing that.
>
> Nothing comes to mind.  Did you use the conversion script?  If not, it
> might be interesting to compare what it gives with your conversion:
>
> http://www.oligarchy.co.uk/xapian/patches/xapian-php-update

Didn't know about that file; I went through my scripts by hand. I will
run that script and do a diff and see what happens.

Thanks Olly for being so responsive and your work on Xapian in general.



More information about the Xapian-discuss mailing list