[Xapian-discuss] Re: preformance issue

Olly Betts olly at survex.com
Sat Dec 23 19:43:23 GMT 2006


On Sun, Dec 24, 2006 at 12:51:56AM +0800, Andrey Kong wrote:
> I found out that the preformance will get back in shape (query time 1-2 
> secs) around 10-20 mins after xapian DB updated.
> So i guess this is the file system cache begins to build up.

That's rather longer than I'd expect, especially as the some useful
blocks should be cached after updating.  What's the typical query rate?

I suspect you might benefit from more RAM and/or moving the web serving
to another server.  Sadly I'm not aware of any tools which can analyse
what the Linux VM system has cached.  "vmstat 5" will show you a
summary every 5 seconds (ignore the first line output - it's some sort
of historical average and is generally atypical in my experience) but
it just gives a summary, not a per process or per file breakdown.

> Will start to clean up unnecessary unqiue terms and
> continue to stack up more docs and see how it goes....

Eliminating unused terms won't hurt, but the effect of them is
ameliorated because they often tend to bunch together and if a whole
block is comprised of such terms we never try to read it.

> > First question, is this with quartz or flint?  I'd suggest using flint
> > rather than quartz.
> 
> i was using quartz, now i will switch to flint, guess its stable enough 
> while you suggeted =)

Flint will be the default in 1.0.

> > Second question - does compacting the database help?  (Using quartzcompact
> > for quartz, or xapian-compact for flint).
> 
> i never tried the compact utility yet, how to trigger this function?
> i cannot find it in the PHP wrapper... =/

They're separate programs, not part of the API.  Installed by
xapian-core.

> I was thinking of something like MOSIX http://www.mosix.cs.huji.ac.il/
> but MOSIX seems a bit old and kinda discontinued....(not sure)

I've not experimented with it, but the FAQ says:

    http://www.mosix.org/faq/output/faq_q0006.html

    MOSIX is suitable to run compute intensive and applications with
    small to moderate amounts of I/O over fast, secure networks in a
    trusted environment (all remote nodes are trusted), e.g., as in
    private clusters and organizational Grids. 

Indexing and search are generally pretty I/O intensive (generally I've
found that I/O not CPU is the limiting factor for larger systems) so
this doesn't sound like a good fit.

It sounds like segmentation of Chinese text is CPU intensive, but it's
also trivially parallelisable anyway - just split the workload between
hosts at the granularity of a document (or greater).  If MOSIX allows
that load to be split and balanced neatly without significant overhead, 
it might be suitable.

Cheers,
    Olly



More information about the Xapian-discuss mailing list