[Xapian-discuss] database stubs: practical limitations, rules of thumb?

Josef Novak josef.robert.novak at gmail.com
Mon Dec 1 11:11:12 GMT 2008


Hi,
  Is there any standing recommendation on the use of database stubs
with xapian?  Is there a rule of thumb in terms of size+number_of_dbs
limit for a stub?  Aside from disk I/O, how does having the individual
dbs located on a remote machine factor into stub usage?
  I've been searching the lists a bit, looking for posts on the usage
of stubs, but I only found one highly-relevant-looking thread,
http://lists.tartarus.org/pipermail/xapian-discuss/2006-August/002533.html
  and the doc overview,
http://xapian.org/docs/overview.html

  and it seems, if the rather old thread is still relevant, that there
is a fairly low limit to the number of dbs one can corral into a
single stub, without incurring a fairly stiff performance hit.
  In my current scenario, I have several 1000 different dbs, each one
associated with a specific geographic location, and I'm trying to come
up with an optimal way of spreading load over multiple dbs, and
multiple machines.  At present I direct queries at the appropriate
location-based db, whenever I can confirm the location unequivocally.
For queries which I know less about, or nothing about, rather than
creating stubs, I've opted to create a hierarchy of larger,
location-based dbs, following a
community<city<county<state<toplevel
   style format, where each city level db incorporates all community
data, and each county incorporates all city data, etc.
   This appears to be considerably faster, and given the thread above,
would appear to be the preferred way to proceed.  However this means
that my larger dbs are each 'all in one place', and are effectively
less robust.  My intuition is that it would make the most sense to
shard each larger city, county, etc. db, based on overall size (and
perhaps access statistics), and distribute the shards over a group of
different machines, but I wonder if there is a rule of thumb in terms
of shard size, and number of shards per stub.  If not I guess I'll
just have to experiment!
   Cheers



More information about the Xapian-discuss mailing list