[Xapian-discuss] Re: DFS solution

James Aylett james-xapian at tartarus.org
Tue Jul 10 17:05:41 BST 2007

On Tue, Jul 10, 2007 at 08:57:40AM -0700, Andrey wrote:

> I am planning to use xapian to index multiple sites (say 100+), and
> each site have their own xapian.Database.  For preformance/
> scalability issues, I like to build it on something which is easy to
> scale in future.  I've looked in apache's Hadoop, thats exactly what
> I needed, but its writtern in java =( and I had a hard time in
> mounting the HDFS(hadoop DFS) to Linux readable file system for
> xapian..
> I dont know if this is the right track or not (DFS)...if there is
> something capable to spread the IO, highly scalable, and readable by
> xapian

What total size are the sites you're looking at? I'd generally
recommend building something that works before worrying about
scaling. Providing you use a nicely decoupled design (which Xapian
probably steers you slightly towards in some ways in any case), you
should be fine to tackle your first bottleneck without changing too
much. Even if you completely rewrite, you'll have a better idea of the
shape of your problem having built it once at a smaller scale.

One standard way to scale Xapian is to have multiple backend machines,
each with local storage and a subset of the database. Your frontend
machine then 'glues' the results together from the different

Note that of more concern to me in scaling Xapian would generally be
performance than dataset size. That's harder to talk about in general
cases, but using multiple backends is a viable approach to this as well.


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

More information about the Xapian-discuss mailing list