[Xapian-discuss] Problems with positions and replace_document

Olly Betts olly at survex.com
Tue Nov 14 02:16:36 GMT 2006


On Mon, Nov 13, 2006 at 10:37:20AM -0200, Fernando Nemec wrote:
> I'm glad to help. If we could have a way to check if the doc already
> has a docid... But as far as I dig into the code, a document alone
> doesn't know his own docdi, is that right?

The Xapian::Document::Internal class which is the actual implementation
knows the docid and Database::Internal* (if the document came from a
database).

> I was wondering if I can use docid to bring a new instance of a the
> document and, as new documents use reference count, compare this
> instance with the one supplied in the argument list. This way, I
> think, the method knows if that's a replace or a update operation. The
> problem is I don't know how expensive is to do such operation,

Xapian::Document is reference counted, but you'll get two different
underlying objects if you call Xapian::Database::get_document() twice on
the same database, even if the docids are the same.

But it's actually easier than that...

You need to compare the Database::Internal pointer and also the docid
(since it's legal to read a Document from one database and write it back
to another).  If those match, that should be enough to know that any
parts of the document (values, postings, document data) which haven't been
modified don't need to be rewritten (if terms_here is false, the terms
are unmodified; similarly for values_here and data_here).

If the Document isn't associated with a database, then the "database"
pointer will be NULL and so will never match the Database::Internal
pointer for the database the document is being added to.

So I think you just need a new method (or maybe 4 new methods) to 
Document::Internal which check if this document is replacing itself
and indicate which parts are modified.  Then we can call these when
handling replace_document to find out what we actually need to change.

Phew!  That's hard to explain in words, but it's actually pretty
straightforward.  Easy than I expected it to be anyway.

> If you want me to, perhaps I can try to think in a smarter and faster
> way to replace positional information when the Documents involved are
> the same.

I think the above is probably both the simplest and fastest way.  If you
want to try implementing it, that'd be cool.  Otherwise I'll have a look
once I've sorted out the things I'm currently working on.

Cheers,
    Olly



More information about the Xapian-discuss mailing list