[Xapian-discuss] Document snippet generation

Ycrux ycrux at club-internet.fr
Mon Jan 14 23:02:04 GMT 2008


I'm looking for a good way to get a good "snippet text"
for a personal search engine based Xapian when showing the

Actually, I'm using "OTS" (Open Text Summurizer) but the result
is good, but not perfect (or almost if possible).

Here's an example of usage:

$ elinks "http://xapian.org/" -force-html -no-numbering -no-references  
2>/dev/null | ots -r 40

=============== generated snippet ==================
The Xapian Project

   Welcome to the Xapian project website.
   It's written in C++, with bindings to allow use from Perl, Python, PHP,
   Java, Tcl, C# and Ruby (so far!)

   Xapian is a highly adaptable toolkit which allows developers to 
easily add
   advanced indexing and search facilities to their own applications. 
Unlike most
   other website search solutions, Xapian's versatility allows you to extend
   Omega to meet your needs as they grow.

The result is OK for this site (not for thoses with frames ...).
I would like to obtain something similar to the google "text snippets".

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with "elinks" 
(the text browser)
like in the previous example.

Thanks in advance.


More information about the Xapian-discuss mailing list