[Xapian-discuss] replace_document issue

Michael Lewis mal at icginc.com
Tue Apr 30 13:15:19 BST 2013


I am converting an MySQL db to use xapian for full-text searches in PHP. I fetch the record ID and the text field to be indexed for each record and then index the document. I am putting the documents into three separate xapian dbs. I need to preserve the original record ID and use it for the xapian document ID. The code I use is:

      $r=$dh->FetchArray();
        $rid=$r['id'];
        $sc=$r['sc'];
        $doc=new XapianDocument();
      $doc->set_data($sc);
        $indexer->index_text($sc);
      $docid=$database->replace_document($rid,$doc);

However, when I use delve on the three xapian DBs I get the following:

delve  -V /var/lib/xapian/segment_1 | more
UUID = 9dd08d44-68a7-4e6b-987e-287dde7bf9c2
number of documents = 448741
average document length = 2284.29
document length lower bound = 1
document length upper bound = 498430
highest document id ever used = 449577
has positional information = true
[root at localhost sw]# delve  -V /var/lib/xapian/segment_2 | more
UUID = 8bf087e7-a9e6-4539-a08e-aeab382fd4c7
number of documents = 498749
average document length = 2302
document length lower bound = 1
document length upper bound = 849692
highest document id ever used = 499667
has positional information = true
[root at localhost sw]# delve  -V /var/lib/xapian/segment_3 | more
UUID = 90b78781-7bc6-4799-98f6-a9b10bb86b31
number of documents = 498589
average document length = 3639.27
document length lower bound = 2
document length upper bound = 517725
highest document id ever used = 499504
has positional information = true

Note that the highest document ID is around the number of records. I attempted to merge the three databases using the --no-renumber option and was rightly given the error message that document ID numbers are not sparse and overlap. The IDs in the mysql database range from 200000 - 3250000 without duplication.

I was under the impression that using the replace_document() function allowed me to set the document ID. Am I wrong or what am I doing wrong?

Thanks for any help.



More information about the Xapian-discuss mailing list