[Xapian-discuss] Replace a term in a document

Mark Clarkson mark.clarkson at smorg.co.uk
Sun Apr 22 05:06:21 BST 2007


On Fri, 2007-04-20 at 20:34 +0100, Olly Betts wrote:
> Changes are needed for the position and postlist tables.  Currently
> we loop over the current document in the database and remove entries
> based on that, then later loop over the Document object passed in and
> add new entries based on that.  So you would need to combine the two
> loops and only update for terms which have been added, modified, or
> removed (if we're replacing a document with itself that is).
> 
> To be able to do that, the document object must track which terms have
> been updated.  Look at common/document.h and api/omdocument.cc.
> Currently we store a flag "terms_here" which says if we are using
> "local" term information (in the map "terms"), or getting them from the
> database.  If any are modified, we get the termlist entry and populate
> "terms" with it, then modify that.
> 
> So we either need something extra in "terms" (or add a second structure)
> to track addition/modification/deletion, or to make terms a `delta' for
> the document (if there is one) in the database, so we just store the
> changes rather than pulling all the information.  That's neater in a
> way, but makes open_term_list(), etc harder to implement.
> 
> Then replace_document() can just check if a document object came from
> the same database object and has the same docid (we already store the
> database and docid so we can lazily fetch information).  If so, it can
> optimise the replacement.

Phew! Thanks for this info. I hope to be able to start coding this in a
couple weeks - I must admit I'm a bit worried though! Hopefully things
are self contained enough for me not to have to stray too far...

I'll return with either a patch, or wishlist bug!

Cheers
Mark.




More information about the Xapian-discuss mailing list