[Xapian-discuss] Getting document's context

Olly Betts olly at survex.com
Tue Mar 6 01:15:49 GMT 2007


On Mon, Mar 05, 2007 at 01:29:22PM +0200, Matti Heinonen wrote:
> When indexing, I am including posting information. When searching, I am 
> able to get the position information for a term using 
> database.positionlist(). But how to get the text in the positions around 
> the term?

We store positional information per term+document so it isn't possible
to answer the question "which terms occur between positions N1 and N2 in
document D" without opening the position lists for every term in
document D and doing a "skip_to" on each.

I'd generally suggest storing a cleaned up copy of the document text in
the document data and generating dynamic samples from that.  Xapian
doesn't currently have a mechanism to do that though (it's something
I'd like to add).

Alternatively, Jean-Francois Dockes posted some C++ code to recreate the
whole document by looking at position list data - it would be easy to
adapt that to only look at a restricted range of document positions:

http://article.gmane.org/gmane.comp.search.xapian.general/2187

Cheers,
    Olly



More information about the Xapian-discuss mailing list