[Xapian-discuss] Re: Re: get_docid over multi-database search

James Aylett james-xapian at tartarus.org
Thu Dec 20 14:12:30 GMT 2007


On Wed, Dec 19, 2007 at 02:32:52PM -0800, Kevin Duraj wrote:

> In my case having 100-500GB data on hard disk, the data cannot fit
> into memory and using two databases is two times slower than using
> single database.

Are you spindle-restricted here? Just a thought.

I don't actually know how the matcher deals with multiple databases
right now, but I suspect it does it in a sort of pseudo-parallel [1],
in which case putting two databases behind the same re-seek bottleneck
is going to utterly destroy performance in a way that wouldn't happen
if you laid out your data differently onto the available
platters. Figuring out the profile of this kind of thing is a pain,
because you often have to write your own analysis tools :-/

[1] I'm sure Olly or Richard can jump in here, but I'm assuming this
because if you fill up the candidate mset from both databases
concurrently then I think you're *probably* going to run for less
time, because your minimum-weight to get into the candidate mset
probably has more chance of drifting up faster (assuming the two
databases are roughly equally relevant to your query). Lots of caveats
there, and my assumption may be wrong anyway :-)

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list