[Xapian-discuss] Get term from document by position
James Aylett
james-xapian at tartarus.org
Sun Jul 26 15:41:51 BST 2015
On 26 Jul 2015, at 15:36, john.alveris at Safe-mail.net wrote:
>> Snippet highlighting is something that was worked on for a GSoC project a
>> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
>> It’s not available in the 1.2 series, but as I understand it should work out of the
>> box in 1.3.3.
>
> I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too long to generate a snippet.
Can you file a bug with some example outputs that are unrelated to the search string?
>> Note that your suggested approach of going from terms to snippet doesn’t work in the general
>> case, because of issues like stemming.
>
> Actually, it works just fine. I am using the following indexing scheme:
> First, i index unstemmed text.
> Next, i add a term with a unique prefix to the database. This term is used as a delimiter between stemmed and unstemmed
> terms.
> Finally, i index stemmed text.
Right, but that’s not the general case. It’s absolutely possible to do things in other ways, of course. (In this case I assume you’re indexing completely untransformed text, just word splitting; you aren’t normalising case for the “raw” terms, for instance. What do you do about punctuation, out of interest?)
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-discuss
mailing list