[Xapian-discuss] Optimization and Load balancing with Xapian

David Levy dvid.levy at gmail.com
Thu Feb 16 15:38:38 GMT 2006

Hi Olly,

Here are my replies :

On 2/16/06, Olly Betts <olly at survex.com> wrote:
> On Wed, Feb 15, 2006 at 01:03:09AM +0200, David Levy wrote:
> > I am experiencing bad response times with Xapian/Omega in the last few
> days.
> > My database has more than 700k records, using ~ 3Go disk space.
> > Maybe my requests or my templates are not optimized, or maybe it's a
> > hardware (disk speed) issue. The weird thing is that often, the search
> time
> > provided in the response is sub second, and the response is actually
> given
> > by Omega over one second (even seconds ...).
> The time reported by "$time" includes the match, but because of how
> Omega works it doesn't include the time to calculate top terms (if
> you're using $topterms), and also doesn't include the time to display
> the matches.  If you're actually displaying a lot of matches that can be
> quite considerable.
> So one thing to check is that $topterms isn't being used.

I do not use it in this template, as I read yet how time consuming it was.
Also I ask the 5 first hits in the omega request  (HITSPERPAGE parameter, is
it the better way ?)

> To solve this issue, I was been thinking about load balancing Xapian. I
> > could not find any information about that on Internet. One of you did it
> yet
> > ? How ?
> I've not done it myself.  The simple approach is just to put several
> boxes in the DNS and they'll be used in a round-robin fashion.

right, i had forgotten this way

> I've done some tests this morning and it seems that some of this
> slowlyness
> > is due to sorting.
> >
> > Indeed, Omega requests with and with sorting do not produce the same
> > calculation time at all. < 1 s without sorting and sometimes > 30 s with
> > sorting.... These 30 seconds happen with results having like 500+
> matches.
> > How can it be possible ? Sorting should not be so much time consuming I
> > guess.
> It's not the actual sorting which takes the extra time - the issue is
> that for a multi-term query, relevance ranking can terminate early in
> many cases (often when we reach the end of the matches for any of the
> terms).  But if results are sorted on a value, we need to consider every
> result which matches the query.

so you are telling me I won't be able to improve my calculation time if I
still use sorting ...?
Is there any other way to get results sorted by another criteria than
relevance ?

>     Olly


David LEVY {selenium}
Website ~ http://www.davidlevy.org
Wishlist Zlio ~ http://david.zlio.com/wishlist
Blog ~ http://selenium.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060216/4a2a62d3/attachment.htm

More information about the Xapian-discuss mailing list