[Xapian-discuss] Replace a term in a document
Mark Clarkson
mark.clarkson at smorg.co.uk
Sun Apr 22 05:06:21 BST 2007
On Fri, 2007-04-20 at 20:34 +0100, Olly Betts wrote:
> Changes are needed for the position and postlist tables. Currently
> we loop over the current document in the database and remove entries
> based on that, then later loop over the Document object passed in and
> add new entries based on that. So you would need to combine the two
> loops and only update for terms which have been added, modified, or
> removed (if we're replacing a document with itself that is).
>
> To be able to do that, the document object must track which terms have
> been updated. Look at common/document.h and api/omdocument.cc.
> Currently we store a flag "terms_here" which says if we are using
> "local" term information (in the map "terms"), or getting them from the
> database. If any are modified, we get the termlist entry and populate
> "terms" with it, then modify that.
>
> So we either need something extra in "terms" (or add a second structure)
> to track addition/modification/deletion, or to make terms a `delta' for
> the document (if there is one) in the database, so we just store the
> changes rather than pulling all the information. That's neater in a
> way, but makes open_term_list(), etc harder to implement.
>
> Then replace_document() can just check if a document object came from
> the same database object and has the same docid (we already store the
> database and docid so we can lazily fetch information). If so, it can
> optimise the replacement.
Phew! Thanks for this info. I hope to be able to start coding this in a
couple weeks - I must admit I'm a bit worried though! Hopefully things
are self contained enough for me not to have to stray too far...
I'll return with either a patch, or wishlist bug!
Cheers
Mark.
More information about the Xapian-discuss
mailing list