[Xapian-discuss] set_cutoff <percent_cutoff> [<weight_cutoff>]

James Aylett james-xapian at tartarus.org
Fri May 11 11:55:00 BST 2007


On Fri, May 11, 2007 at 05:23:43AM +0100, Olly Betts wrote:

> > I want the top speed during indexing and searches, and I do not care about
> > smallest database. I think most of users feel the same. If "gzip -9" makes
> > the indexing slightly slower, remove it. *smile* :-)
> 
> The thing is that smaller is often faster.  Once I/O becomes the
> limiting factor, compression will speed things up.  CPU speeds have
> increased faster than storage speeds over time, so this is likely to
> be more true than it ever was!

This is hugely important, and is something that a lot of people
miss. It doesn't make a huge amount of difference when you're dealing
with small data sets (say, less than half the size of core), but then
the delta cost should be fairly minimal. Once you get into moderately
large data sets (say two to four times core), you're going to start
hurting very badly if you're wasting time transferring data
suboptimally (*). Even if you can stack enough disks to get maximum
fibre speed, you're still only managing a few gig per second; given
your core will be a minimum of 8G these days, cutting down your
storage size becomes really important. (And that's assuming that only
one machine has access to the fabric, when it's more likely to be
shared...)

David Braben has an interesting graph that backs this up (admittedly
from the point of view of consoles). It's *more* important to get
decent compression on your data than it was in the days of Elite and
Exile!

(*) I have a tiresome anecdote about inefficient data transfer over
NFSv3 versus NFSv4 bringing our data centre to a standstill.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list