[Xapian-discuss] Re: Getting document's context

Fabrice Colin fabrice.colin at gmail.com
Tue Mar 6 12:10:28 GMT 2007


On 3/6/07, Olly Betts <olly at survex.com> wrote:
> On Mon, Mar 05, 2007 at 01:29:22PM +0200, Matti Heinonen wrote:
> > When indexing, I am including posting information. When searching, I am
> > able to get the position information for a term using
> > database.positionlist(). But how to get the text in the positions around
> > the term?
>
> We store positional information per term+document so it isn't possible
> to answer the question "which terms occur between positions N1 and N2 in
> document D" without opening the position lists for every term in
> document D and doing a "skip_to" on each.
>
> I'd generally suggest storing a cleaned up copy of the document text in
> the document data and generating dynamic samples from that.  Xapian
> doesn't currently have a mechanism to do that though (it's something
> I'd like to add).
>
> Alternatively, Jean-Francois Dockes posted some C++ code to recreate the
> whole document by looking at position list data - it would be easy to
> adapt that to only look at a restricted range of document positions:
>
> http://article.gmane.org/gmane.comp.search.xapian.general/2187
>
I wrote something similar for Pinot that tries to find the best "window", i.e.
where in the document the largest number of query terms is to be found.

I can't claim it's optimal but it works well enough for me. You can
find the code at
http://svn.berlios.de/wsvn/pinot/tags/version_0_7_0/Search/AbstractGenerator.cpp?op=file&rev=0&sc=0

Fabrice



More information about the Xapian-discuss mailing list