[Xapian-discuss] Incremental indexing

Marios Titas redneb8888 at gmail.com
Tue Mar 20 17:01:48 GMT 2012


On Tue, Mar 20, 2012 at 09:34, Olly Betts <olly at survex.com> wrote:
> Yes it does pretty well, assuming you're using Xapian 1.2 or later
> 1.0.x.  This change made a huge difference to the speed of adding and
> removing tags from indexed email in notmuch.

In order for me to understand how this works, consider the following scenario:
1. I retrieve a document with db.get_document. Assume that it has n terms.
2. I add k terms with doc.add_term or doc.add_posting.
3. I write the document back to the database with db.replace_document.
So are you saying that the above procedure takes O(k) time? What about
memory, is it O(k) too? Or does xapian load the entire document in
memory (in which case it would be O(n+k) for both time and memory)?

There is also another problem with this approach, it is not easy to
remove terms from a document. It would be nice if the document class
had method to decrement the wdf of a term and if the new wdf is zero
then remove it altogether.



More information about the Xapian-discuss mailing list