[Xapian-discuss] Xapian Terms vs. Document Partition.

Kevin Duraj kevin.softdev at gmail.com
Fri May 9 20:20:03 BST 2008


Alex,

I agree, the core of the talk is not about document partitioning, and
that was not the issue I brought to Xapian community. I am saying that
we must implement index partitioning based on terms like Google and
not based on documents.

Diego Puppin from Google confirmed important information that is
insignificant to the rest of the talk, and the fact is that is Google
partitioning index based on terms for initial search. There is only
one slide that shows Google partitioning index is based on terms and
not based on documents. I am focusing on the one slide only where
Diego mentioning that the index is partitioning based on terms, and I
do not refer to the rest of the talk.

Thank you,
--
Kevin Duraj
http://myhealthcare.com

On Thu, May 8, 2008 at 8:28 AM, Alex Brasetvik
<alex-xapian at brasetvik.com> wrote:
>
> On Tue, 6 May 2008 16:48:01 -0700, "Kevin Duraj" <kevin.softdev at gmail.com>
>
> wrote:
>
>
>
>> Xapian Terms vs. Document Partition.
>
>>
>
>> On December 2007, Diego Puppin from Google had interesting talk about
>
>> parallel architecture distributing index based on terms rather than
>
>> documents.
>
>> Reference:
>
>> http://youtube.com/watch?v=KpZpsu2wM1s
>
>
>
> [snip]
>
>
>
>> I would like again encourage Xapian community to
>
>> start looking into distributing index based on terms rather than
>
>> documents. To make each server be responsible for set of terms rather
>
>> then set of documents would enable us to scale our search engine to
>
>> Google's level.
>
>
>
> If you watch the talk again and read their paper[1], you'll see that the
>
> gist of the talk is *not* about neither document- nor term-partitioning.
>
> Also,
>
> in their paper, they suggest ``Document partitioning is the strategy
>
> usually
>
> chosen by the most popular web search engines'', citing Page and Brin's
>
> paper on Google's architecture. You may want to read it.
>
>
>
> ~
>
>
>
> [1] http://scholar.google.no/scholar?hl=en&lr=&cluster=10013139656811614516
>
>
>
> --
>
> Alex Brasetvik
>
>



-- 
Kevin Duraj
http://myhealthcare.com



More information about the Xapian-discuss mailing list