[Xapian-devel] Fetching document content by Q term in Python

Olly Betts olly at survex.com
Fri Feb 9 08:01:05 GMT 2007


On Fri, Feb 09, 2007 at 11:18:10AM +1100, Alec Thomas wrote:
> I'd like to be able to retrieve the indexes stored copy of the document
> text and tried the following:
> 
>     terms = self.db.allterms()
>     terms.skip_to('Q' + uri.encode('utf-8'))
>     term = terms.next()
>     doc = self.db.get_document(term[1])
>     print doc.get_data()
> 
> I just wildly guessed that [1] was the docid, but of course it isn't. So the
> question is, how do I get a docid out of a term?

This will print the data from each document indexed by a particular
term:

    term = 'Q' + uri.encode('utf-8')
    for docid in self.db.postlist(term):
	doc = self.db.get_document(docid)
	print doc.get_data()

You get a PostingIter from db.postlist(term) - see
python/docs/bindings.html for details.

> Or if I'm completely on the wrong track, how do I get the document from
> a Q term?

Alternatively, you can run a search for the Q-prefixed term.  The above
is a little less work though.

Cheers,
    Olly



More information about the Xapian-devel mailing list