[Xapian-discuss] indexing strategy for "near real time" indexing

Jarrod Roberson jarrod at vertigrated.com
Wed Jun 20 16:29:21 BST 2007


On 6/19/07, Sam Liddicott <sam at liddicott.com> wrote:
>
> Are you indexing a mail store with reference to the store to retrieve the
> original message,


yes, the plan is to index in parallel using a "notification" scheme. Where
the storage layer will notify the indexer of message id's it needs to
retrieve and index, or message id's it needs to delete, these notifications
will go on a queue. The indexer will then go and get the mail messages when
it can and index or delete from the index as needed.

I guess I was just looking for some validation on the approach, and maybe
how others were working around the one writer per database "problem".

We do over 100 million delivery attempts a day, not all those would get
indexed as a little more than half is spam and never gets delivered, but you
can see I have some big numbers to deal with.


More information about the Xapian-discuss mailing list