[Xapian-discuss] Re: DFS solution
James Aylett
james-xapian at tartarus.org
Tue Jul 10 17:05:41 BST 2007
On Tue, Jul 10, 2007 at 08:57:40AM -0700, Andrey wrote:
> I am planning to use xapian to index multiple sites (say 100+), and
> each site have their own xapian.Database. For preformance/
> scalability issues, I like to build it on something which is easy to
> scale in future. I've looked in apache's Hadoop, thats exactly what
> I needed, but its writtern in java =( and I had a hard time in
> mounting the HDFS(hadoop DFS) to Linux readable file system for
> xapian..
>
> I dont know if this is the right track or not (DFS)...if there is
> something capable to spread the IO, highly scalable, and readable by
> xapian
What total size are the sites you're looking at? I'd generally
recommend building something that works before worrying about
scaling. Providing you use a nicely decoupled design (which Xapian
probably steers you slightly towards in some ways in any case), you
should be fine to tackle your first bottleneck without changing too
much. Even if you completely rewrite, you'll have a better idea of the
shape of your problem having built it once at a smaller scale.
One standard way to scale Xapian is to have multiple backend machines,
each with local storage and a subset of the database. Your frontend
machine then 'glues' the results together from the different
databases.
Note that of more concern to me in scaling Xapian would generally be
performance than dataset size. That's harder to talk about in general
cases, but using multiple backends is a viable approach to this as well.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list