[Xapian-discuss] Xapian Terms vs. Document Partition.

Alex Brasetvik alex-xapian at brasetvik.com
Thu May 8 16:28:32 BST 2008

On Tue, 6 May 2008 16:48:01 -0700, "Kevin Duraj" <kevin.softdev at gmail.com>

> Xapian Terms vs. Document Partition.
> On December 2007, Diego Puppin from Google had interesting talk about
> parallel architecture distributing index based on terms rather than
> documents.
> Reference:
> http://youtube.com/watch?v=KpZpsu2wM1s


> I would like again encourage Xapian community to
> start looking into distributing index based on terms rather than
> documents. To make each server be responsible for set of terms rather
> then set of documents would enable us to scale our search engine to
> Google's level.

If you watch the talk again and read their paper[1], you'll see that the
gist of the talk is *not* about neither document- nor term-partitioning.
in their paper, they suggest ``Document partitioning is the strategy
chosen by the most popular web search engines'', citing Page and Brin's
paper on Google's architecture. You may want to read it.


[1] http://scholar.google.no/scholar?hl=en&lr=&cluster=10013139656811614516

Alex Brasetvik

More information about the Xapian-discuss mailing list