[Xapian-discuss] Incremental indexing

Olly Betts olly at survex.com
Thu Mar 21 05:01:17 GMT 2013


On Mon, Mar 18, 2013 at 01:52:01PM +0800, ???? wrote:
> I am trying to implement an Incremental indexing scheme. The problem
> is that usually the modified documents are large but the modifications
> are limited. Ideally, I would like to reindex only the modified parts
> of these documents. If I am not mistaken, xapian cannot do that.

Xapian does try to be lazy here - in particular, you can get a document
from the database, make some changes (e.g. add or remove some terms),
and call replace_document() to update it in the database, then only the
posting lists for those terms will be updated, plus the termlist for
the document itself, and (if the document length changes) the document
length pseudo-posting list.

> It would be nice if xapian supported something like the SQL "group
> by". If it did, then it would be possible to break large documents
> into several pieces which could be indexed independently. When
> querying, these pieces would be then combined again using some
> aggregate function similar to the SQL function sum.

As Chris points out, collapsing would allow you to achieve something
like this, though such an approach inherently restricts the queries you
can perform.  For example, if you split the title and body, a search for
title:foo AND body:bar is hard to do (but title:foo OR body:bar is
easy).

> Are there any other approaches?

Depending exactly what you're trying to do, using a PostingSource to
feed in the more frequently changing information might be suitable.

Cheers,
    Olly



More information about the Xapian-discuss mailing list