[Xapian-discuss] Text snippets

Jim jim at fayettedigital.com
Sat Dec 26 17:02:15 GMT 2009

karpet at localhost.com wrote:
>> On Thu, Dec 17, 2009 at 11:29:52AM +0300, Do. wrote:
>> There's a ticket in trac as well as the FAQ entry.  The FAQ entry had some
>> rough edges (e.g. the sample thread it linked to wasn't about snippets at
>> all)
>> so I've overhauled it, and linked to the ticket as part of that:
>> http://trac.xapian.org/wiki/FAQ/Snippets
> FWIW, the Search::Tools modules mentioned in that FAQ entry have gotten a
> lot of work in the last six months, and many of the slow parts moved to
> C/XS. The FAQ entry mentions problems with phrases and stemming, and to
> the best of my knowledge those have been resolved.
> I use Search::Tools with Xapian quite successfully. I store the entire
> plain (no HTML) text of each document in the 'data' entry for each
> document, and can snip and highlight very easily with Search::Tools +
> Search::Xapian. If I want to highlight terms in the original document, I
> use HTML::HiLiter.
> http://search.cpan.org/dist/Search-Tools/
> I would be happy to change the FAQ entry to reflect the above, but of
> course as the author of Search::Tools I am biased, so if you find that
> Search::Tools doesn't work well with Xapian I'd like to hear about it.
> pek
I also use Search::Tools with good success with Xapian.  They seem to 
work well.  However I do not store my data in the Xapian database since 
the data is already on disk in html format (3+ Gb) and to generate 
samples for inclusion in the results page I simply run html2text.  The 
speed is satisfactory, so I didn't find a need for keeping duplicate 
data.  Another advantage is that indexing is faster when I don't store 
the data in the database. 

I only mention this to provide an alternate solution for future 
reference to anyone searching for ways to solve a problem.


