[Xapian-discuss] Setting self-defined doc ID with PHP bindings

Yannick Warnier ywarnier at beeznest.org
Fri Feb 8 17:16:55 GMT 2008


Le mercredi 06 février 2008 à 15:43 +0000, James Aylett a écrit :
> On Wed, Feb 06, 2008 at 10:34:38AM -0500, Yannick Warnier wrote:
> 
> > I've found some comments about that dating from some time ago, saying
> > that one should use the XapianWritableDatabase::replace_document()
> > method rather than the XapianWritableDatabase::add_document(), so that I
> > can add the document ID myself (being numeric and auto-incremented, I
> > don't suppose that would create a problem, would it?)
> 
> It shouldn't do, no. You'll start getting issues if you need to move
> to multiple databases, or something else more complex, but for now you
> should be fine.

Just what I thought. Am I likely to be moving to multiple databases?
What kind of amount of documents are we talking about to feel the need
for additional databases (I'm plugging that into a PostgreSQL database)?
Or are we just talking of some specific case where I would need multiple
databases to start with?

> > However, if that seems to work well (I get no error message), I have no
> > idea how to check if my document was indexed using the right ID (is
> > there some command-line tool to check the list of documents in my
> > database?).
> 
> If you use replace_document(), it will be indexed with the right
> ID. To get a list of docids, probably the quickest thing is to write a
> quick PHP script to check $db->get_document(docid) for all docids you
> care about. $db->get_lastdocid() and $db->get_doccount() will help here.

I'm afraid replace_document() doesn't seem to work as it should.
I'm logging my call to 		

$db->replace_document($myid,$doc);

$myid is equal to 19. $doc is my document. The call effectively creates
a new document in my Xapian database, however the ID reported by a
subsequent call to $db->get_lastdocid() equals 3, which is effectively
the total number of documents in my database... Any idea why it would
behave like that?

The output of $db->get_document($docid) is a resource, so the right
thing to use to get document data is

  $db->get_document($docid)->get_data()

Outputting this very roughly can be done with
  
  print_r($db->get_document($docid)->get_data());

Is there a way to go through all the database documents one at a time?
Some kind of $db->get_firstdocid() and then $db->get_nextdocid() ?

> > Also, there seems to be no way (as far as the PHP bindings are
> > concerned) to get this document ID back from search results. I actually
> > do get search results, but the PHP print_r() function isn't very helpful
> > with the contents of an mset or one document out of an mset, as it
> > always give me a "Resource" output.
> > 
> > I have tried a 
> >   $mset->get_document()->get_data();
> > and a
> >   $mset->get_document()->get_docid();
> > but neither of these seem to work.
> 
> Try $mset->get_docid().
> 
> $mset->get_document()->get_data() will return whatever you put in
> there in the first place (using $doc->set_data()), so my guess is you
> aren't putting anything there at all right now. For your application
> you may not have a use for it.

I wasn't very clear about that. I do get some data back (the text I'm
putting in with $doc->set_data(), indeed), I'm just not getting the
document ID itself. Is there a way to get one document's ID? (like
$doc->get_id() for example).

I'm using the php5-xapian 1.0.2-1 Ubuntu package.

Yannick




More information about the Xapian-discuss mailing list