[Xapian-discuss] Rqt for Features

Olly Betts olly at survex.com
Mon Aug 9 15:17:14 BST 2004


On Fri, Jul 09, 2004 at 04:52:02PM +0100, Richard Boulton wrote:
> With Xapian as it currently stands, the way to do this is to specify
> a unique term, and store it in each document.  The unique terms would
> be comprised of a prefix followed by your document identifiers.
> Traditionally, "Q" has been used for the prefix, but any prefix which
> will avoid collisions with other terms is acceptable.
> 
> Whenever a document is modified, you would first open the postlist for the
> term, which gives you a list of all documents containing the term, and
> delete these documents (hopefully, this list would be of length 0 or 1).
> Then, add the new document.

Instead of deleting and adding, I'd suggest calling replace_document -
the replacement document will generally be similar to the existing one,
which should allow the backend to do less resplicing of posting lists.

> There is a proposal to add a new API method to delete all documents
> containing a given term, which would ease the implementation of this scheme
> (I'm not sure of the status of this proposal).

I guess you mean my suggestion that replace_document should allow a term
to be specified instead of a document id?  The document passed would
replace the first document (if any) indexed by that term, and probably
any other documents would be deleted.  Then external uid handling just
becomes:

    std::string uidterm = std::string("Q") + uid;
    doc.add_term(uidterm);
    db.replace_document(uidterm, doc);

Rather than something like:

    std::string uidterm = std::string("Q") + uid;
    doc.add_term(uidterm);
    Xapian::PostingIterator p = db.postlist_begin(uidterm);
    if (p != database.postlist_end(uidterm)) {
	db.replace_document(*p, doc);
    } else {
	db.add_document(doc);
    }

This gives cleaner user code and allows the backend to handle this more
efficiently.

To complement this, delete_document should probably also take a termname
as an alternative to a document id.  That allows a document with a given
UID to be easily removed, and has wider potential uses.

I think the status is that I mentioned it to you when you popped round a
couple of months ago, and we both thought is was a good idea, but that
neither of us has implemented it yet!

An implementation which just pushes the posting iterator implmentation
into the library is almost trivial, so I'll try to do that shortly.
That takes care of the API and backend optimisation of this case can
follow later.

Cheers,
    Olly



More information about the Xapian-discuss mailing list