[Xapian-discuss] Getting document's context
Olly Betts
olly at survex.com
Tue Mar 6 01:15:49 GMT 2007
On Mon, Mar 05, 2007 at 01:29:22PM +0200, Matti Heinonen wrote:
> When indexing, I am including posting information. When searching, I am
> able to get the position information for a term using
> database.positionlist(). But how to get the text in the positions around
> the term?
We store positional information per term+document so it isn't possible
to answer the question "which terms occur between positions N1 and N2 in
document D" without opening the position lists for every term in
document D and doing a "skip_to" on each.
I'd generally suggest storing a cleaned up copy of the document text in
the document data and generating dynamic samples from that. Xapian
doesn't currently have a mechanism to do that though (it's something
I'd like to add).
Alternatively, Jean-Francois Dockes posted some C++ code to recreate the
whole document by looking at position list data - it would be easy to
adapt that to only look at a restricted range of document positions:
http://article.gmane.org/gmane.comp.search.xapian.general/2187
Cheers,
Olly
More information about the Xapian-discuss
mailing list