[Xapian-discuss] xapian doc_id & duplicate documents

Olly Betts olly at survex.com
Tue Oct 18 18:37:51 BST 2005


On Tue, Oct 18, 2005 at 05:26:40PM +0000, Salem Berhanu wrote:
> "Provided you're using Xapian 0.8.2 or later you can specify the docid to 
> use - just call Xapian::WritableDatabase::replace_document() with the docid 
> you want to use"
> 
> I am but the problem is how do I know the existing docid of a document that 
> is the same as the one I am about to add?

When you said "I can choose my own doc_id", I assumed you were wanting
to reuse a numeric unique id from another system (e.g. a common
situation is indexing data from an SQL database which has a
monotonically increasing "row number" counter).

As I said, if your unique id is non-numeric or numeric but sparse, add a
unique id term to each document (e.g. "Q"+URL) and use
replace_document() with this term when adding or updating documents.
This is how omindex works if you want to look at some existing code.  
Another example is scriptindex's UNIQUE command.

Then you can simply ignore the docid - in this case you can consider it
to be an internal detail you needn't concern yourself with.

> Is there a direct way of replacing the docid right before or after adding 
> to the database? if not where would be a good place to store my unique 
> identifier for a document. This is also important when querying since I 
> need to link the doc_id to my unique identifier.

If you need the unique id to display results, you'll probably also want
to store it in the document data.  Again, omindex uses this technique.

Cheers,
    Olly



More information about the Xapian-discuss mailing list