[Xapian-discuss] Splitting terms into separate B-Trees.

Kevin Duraj kevin.softdev at gmail.com
Thu Sep 13 20:47:44 BST 2007


Thanks for all your suggestions ... so far I crawled and indexed 30
millions web sites on one server at http://pacificair.com  and do not
see any performance bottlenecks using Xapian.  I will see how far I
can get after I will crawled and index another 70 million web sites of
total 100 million on one server and what my options will be to split
the load ...

-- 
Cheers,
   Kevin Duraj
   http://pacificair.com


On 6/1/07, James Aylett <james-xapian at tartarus.org> wrote:
> On Thu, May 31, 2007 at 01:58:19PM -0700, Kevin Duraj wrote:
>
> > This would explain why Google in their documents claim to use cheap
> > servers, because partitioning B-Tree indexes by the terms can bring
> > the index down to any small size that the server can handle well, and
> > actually the servers would become the part of the B-Tree. There is no
> > limit how much you can search and size down the database.
>
> However Google doesn't care when identical adjacent searches give
> different results. There are lots of things you can do if you're
> trying to solve Google's problem, which often won't apply in a more
> constrained system. (You'd never want to do that kind of thing with a
> medical search system, for instance :-)
>
> J
>
> --
> /--------------------------------------------------------------------------\
>  James Aylett                                                  xapian.org
>  james at tartarus.org                               uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list