[Xapian-discuss] uniq elements or check for exists

Richard Boulton richard at tartarus.org
Thu Nov 5 15:09:32 GMT 2009


2009/11/5 岳帅杰 <ysj.ray at gmail.com>:
> I think you can solve this problem like this:
>
> Store the md5 value of each document in its value slot, then check every string's md5 if it is exists in the database before add it.(Using OP_VALUE_RANGE to query it).
> This may be a little slow, but if you needn't add too many document once, it will not be a big problem.

No, don't do it like this.  This isn't what value slots are best used
for, at all, and is needlessly slow.

Instead, if you're wanting to use an md5 value as a unique ID, use a
prefixed term, as described in the "Using a term for the external
unique id" section of the document Olly linked to:
http://trac.xapian.org/wiki/FAQ/UniqueIds

This approach doesn't involve doing a very slow query for each
document you're adding: an OP_VALUE_RANGE query (when combined with no
other queries) will iterate through all the documents in the database.
 Looking up a prefixed term involves a single B-tree read.

-- 
Richard



More information about the Xapian-discuss mailing list