[Xapian-discuss] Re: Evaluating Xapian

Olly Betts olly at survex.com
Tue Jan 25 01:27:04 GMT 2005


On Tue, Jan 25, 2005 at 12:32:35AM +0100, Arne Georg Gleditsch wrote:
> If I want to add a new meta-tag to an already indexed document, I
> naturally do something like this:
> 
>     my $doc = $wrdb->get_document($doc_id);
>     $doc->add_term('NEW_TAG');
> 
> To flush this to the database it seems I then have to do
> 
>     $wrdb->replace_document($doc_id, $doc);
> 
> but my impression (based on gut feeling only, I admit) is that this is
> rather slow.  Is this the right way to do it, and are there more
> efficient methods to use when I just want to add a single term like
> this?

That's how you'd do that.

It shouldn't be too bad - it necessarily has to rewrite that document's
entry in the termlist table, and modify the posting list for NEW_TAG.

Currently it will also needlessly rewrite an unchanged block of the
postlist table for each unchanged term indexing the document, another
block for the record table, and another for values (if the document has
any).  The positionlist table may get several blocks rewritten (if
you're indexing with positional information) depending how long the
documents are.

This rewriting of unchanged blocks could be optimised out.  Much of the
machinery neded is in place (Xapian::Document reads information from
disk lazily so, it's easy to tell if someone is writing back the same
document with unchanged data and values, for example).

I've not implemented this so far simply because it's not a hot spot for
most users!

Are you doing this a lot?

Cheers,
    Olly



More information about the Xapian-discuss mailing list