[Xapian-discuss] replace_document issue
Michael Lewis
mal at icginc.com
Tue Apr 30 13:15:19 BST 2013
I am converting an MySQL db to use xapian for full-text searches in PHP. I fetch the record ID and the text field to be indexed for each record and then index the document. I am putting the documents into three separate xapian dbs. I need to preserve the original record ID and use it for the xapian document ID. The code I use is:
$r=$dh->FetchArray();
$rid=$r['id'];
$sc=$r['sc'];
$doc=new XapianDocument();
$doc->set_data($sc);
$indexer->index_text($sc);
$docid=$database->replace_document($rid,$doc);
However, when I use delve on the three xapian DBs I get the following:
delve -V /var/lib/xapian/segment_1 | more
UUID = 9dd08d44-68a7-4e6b-987e-287dde7bf9c2
number of documents = 448741
average document length = 2284.29
document length lower bound = 1
document length upper bound = 498430
highest document id ever used = 449577
has positional information = true
[root at localhost sw]# delve -V /var/lib/xapian/segment_2 | more
UUID = 8bf087e7-a9e6-4539-a08e-aeab382fd4c7
number of documents = 498749
average document length = 2302
document length lower bound = 1
document length upper bound = 849692
highest document id ever used = 499667
has positional information = true
[root at localhost sw]# delve -V /var/lib/xapian/segment_3 | more
UUID = 90b78781-7bc6-4799-98f6-a9b10bb86b31
number of documents = 498589
average document length = 3639.27
document length lower bound = 2
document length upper bound = 517725
highest document id ever used = 499504
has positional information = true
Note that the highest document ID is around the number of records. I attempted to merge the three databases using the --no-renumber option and was rightly given the error message that document ID numbers are not sparse and overlap. The IDs in the mysql database range from 200000 - 3250000 without duplication.
I was under the impression that using the replace_document() function allowed me to set the document ID. Am I wrong or what am I doing wrong?
Thanks for any help.
More information about the Xapian-discuss
mailing list