[Xapian-discuss] Get term from document by position

James Aylett james-xapian at tartarus.org
Sun Jul 26 15:41:51 BST 2015


On 26 Jul 2015, at 15:36, john.alveris at Safe-mail.net wrote:

>> Snippet highlighting is something that was worked on for a GSoC project a
>> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
>> It’s not available in the 1.2 series, but as I understand it should work out of the
>> box in 1.3.3.
> 
> I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too long to generate a snippet.

Can you file a bug with some example outputs that are unrelated to the search string?

>> Note that your suggested approach of going from terms to snippet doesn’t work in the general 
>> case, because of issues like stemming. 
> 
> Actually, it works just fine. I am using the following indexing scheme: 
> First, i index unstemmed text.
> Next,  i add a term with a unique prefix to the database. This term is used as a delimiter between stemmed and unstemmed
> terms.
> Finally, i index stemmed text.

Right, but that’s not the general case. It’s absolutely possible to do things in other ways, of course. (In this case I assume you’re indexing completely untransformed text, just word splitting; you aren’t normalising case for the “raw” terms, for instance. What do you do about punctuation, out of interest?)

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-discuss mailing list