[Xapian-discuss] Moving to 1.0.x

Mike Boone boonedocks at gmail.com
Tue Oct 9 02:04:59 BST 2007


On 10/8/07, Olly Betts <olly at survex.com> wrote:
> On Mon, Oct 08, 2007 at 04:29:52PM -0400, Mike Boone wrote:
> > The new box is running Red Hat Enterprise 4, for which the default
> > install of PHP is a Red-Hat-patched version 4.3.9. I'm hoping to leave
> > it alone so I can let Red Hat worry about the security updates. If I
> > find something that would really work better under PHP5, I might just
> > switch.
>
> You may find they stop supporting it fairly soon -
> http://www.php.net/archive/2007.php says...
> PHP doesn't have the best security track record, so I'd be suprised if
> Linux distros took on the burden of security support after the PHP team
> give up.

I knew about this but figured Red Hat would still be good for updates.
I still could go either way on this.

> Note that at present, flushing doesn't release memory to the OS - it
> only gets returned to the C++ memory allocation system.  So you won't
> see memory usage drop, but it shouldn't climb without limit.
>
> How are you measuring memory usage BTW?

Over my set of documents the memory usage always went up, fairly
consistently. About 3MB per 1000 documents for the small documents and
12MB per 1000 for the larger ones. I'm measuring the memory with the
PHP function memory_get_usage(). It seems to be in the ballpark with
what I see on "top -c".

> > I leave the WritableDatabase object open the whole
> > time; I have not tried to close it and reopen it after X documents, or
> > limit the process to X documents and then restart it for the next set
> > of X.
>
> Hmm, perhaps there's a memory leak in the PHP wrappers.  If you remove
> the calls to add_document() and/or replace_document(), do you also see
> increasing memory usage?

I commented out add_document and replace_document and it made no
noticeable difference. Next I tried to comment out the word stemming
and that knocked the 12MB/1000 docs down to about 6MB/1000 docs. So
part of the problem looks to be there. The bulk of the rest of the
indexing is new XapianDocuments and add_postings. I tried to
explicitly set the XapianDocument object to NULL after the add/replace
but it made no difference.

Mike.



More information about the Xapian-discuss mailing list