[Xapian-discuss] Get term from document by position
john.alveris at Safe-mail.net
john.alveris at Safe-mail.net
Sun Jul 26 15:36:17 BST 2015
> Snippet highlighting is something that was worked on for a GSoC project a
> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
> It’s not available in the 1.2 series, but as I understand it should work out of the
> box in 1.3.3.
I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too long to generate a snippet.
> Note that your suggested approach of going from terms to snippet doesn’t work in the general
> case, because of issues like stemming.
Actually, it works just fine. I am using the following indexing scheme:
First, i index unstemmed text.
Next, i add a term with a unique prefix to the database. This term is used as a delimiter between stemmed and unstemmed
terms.
Finally, i index stemmed text.
When generating snippet (if stemmer is being used) i get positions of the stemmed terms (that the snipped should consist of) and the position of the delimiter. Next, i make an appropriate shift and get positions of the corresponding unstemmed terms.
This approach works fine, except for the fact that i have to cycle to get terms by position (this operation is time-consuming).
Let me not that Recoll ( http://www.lesbonscomptes.com/recoll/ ) uses the similar approach to generate snippet (actually, i am using their method with some modifications). To get a term by position they cycle through all of the terms too.
While it works, it takes 1-2 seconds to generate snippets (about 10 snippets). I think that if one had a way to get a term by position fast, than the snippet generation would be much more faster.
>
> > Hello. Is there any FAST way to get a term from the xapian document by it's position, something like
> > std::string term = Xapian::Document::GetTermByPosition(int position) ?
>
> Not that I’m aware of. Snippet highlighting is something that was worked on for a GSoC project a few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. It’s not available in the 1.2 series, but as I understand it should work out of the box in 1.3.3.
>
> Note that your suggested approach of going from terms to snippet doesn’t work in the general case, because of issues like stemming. Instead, Mihai’s approach was to use the matcher information to generate a snippet from the original, unstemmed and untermed, text.
More information about the Xapian-discuss
mailing list