[Xapian-discuss] replace_document issue

Richard Boulton richard at tartarus.org
Tue Apr 30 13:31:23 BST 2013


replace_document() is overloaded in C++; there's:

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#23344c9000ea98b15d491fa875bd5d1e

which takes an integer, and uses that as the docid, and

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#43c4630ec482508667e9ca539f19cbf0

which takes a term, and uses that as a unique term

I suspect you're supplying a string in PHP, and using the latter form.  If
you cast it to an int, you may have more success.

In retrospect, this overloading is probably an API mistake, and it would
have been better to use a different method name for the version which takes
a string; it's clear in C++, but often a confusion in dynamically typed
languages.

You might also want to read through
http://trac.xapian.org/wiki/FAQ/UniqueIds for more on this topic.

HTH,
-- 
Richard


On 30 April 2013 13:15, Michael Lewis <mal at icginc.com> wrote:

> I am converting an MySQL db to use xapian for full-text searches in PHP. I
> fetch the record ID and the text field to be indexed for each record and
> then index the document. I am putting the documents into three separate
> xapian dbs. I need to preserve the original record ID and use it for the
> xapian document ID. The code I use is:
>
>       $r=$dh->FetchArray();
>         $rid=$r['id'];
>         $sc=$r['sc'];
>         $doc=new XapianDocument();
>       $doc->set_data($sc);
>         $indexer->index_text($sc);
>       $docid=$database->replace_document($rid,$doc);
>
> However, when I use delve on the three xapian DBs I get the following:
>
> delve  -V /var/lib/xapian/segment_1 | more
> UUID = 9dd08d44-68a7-4e6b-987e-287dde7bf9c2
> number of documents = 448741
> average document length = 2284.29
> document length lower bound = 1
> document length upper bound = 498430
> highest document id ever used = 449577
> has positional information = true
> [root at localhost sw]# delve  -V /var/lib/xapian/segment_2 | more
> UUID = 8bf087e7-a9e6-4539-a08e-aeab382fd4c7
> number of documents = 498749
> average document length = 2302
> document length lower bound = 1
> document length upper bound = 849692
> highest document id ever used = 499667
> has positional information = true
> [root at localhost sw]# delve  -V /var/lib/xapian/segment_3 | more
> UUID = 90b78781-7bc6-4799-98f6-a9b10bb86b31
> number of documents = 498589
> average document length = 3639.27
> document length lower bound = 2
> document length upper bound = 517725
> highest document id ever used = 499504
> has positional information = true
>
> Note that the highest document ID is around the number of records. I
> attempted to merge the three databases using the --no-renumber option and
> was rightly given the error message that document ID numbers are not sparse
> and overlap. The IDs in the mysql database range from 200000 - 3250000
> without duplication.
>
> I was under the impression that using the replace_document() function
> allowed me to set the document ID. Am I wrong or what am I doing wrong?
>
> Thanks for any help.
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list