[Xapian-discuss] indexing strategy for "near real time" indexing
jarrod at vertigrated.com
Wed Jun 20 16:29:21 BST 2007
On 6/19/07, Sam Liddicott <sam at liddicott.com> wrote:
> Are you indexing a mail store with reference to the store to retrieve the
> original message,
yes, the plan is to index in parallel using a "notification" scheme. Where
the storage layer will notify the indexer of message id's it needs to
retrieve and index, or message id's it needs to delete, these notifications
will go on a queue. The indexer will then go and get the mail messages when
it can and index or delete from the index as needed.
I guess I was just looking for some validation on the approach, and maybe
how others were working around the one writer per database "problem".
We do over 100 million delivery attempts a day, not all those would get
indexed as a little more than half is spam and never gets delivered, but you
can see I have some big numbers to deal with.
More information about the Xapian-discuss