[Xapian-discuss] Document snippet generation
ycrux at club-internet.fr
Mon Jan 14 23:02:04 GMT 2008
I'm looking for a good way to get a good "snippet text"
for a personal search engine based Xapian when showing the
Actually, I'm using "OTS" (Open Text Summurizer) but the result
is good, but not perfect (or almost if possible).
Here's an example of usage:
$ elinks "http://xapian.org/" -force-html -no-numbering -no-references
2>/dev/null | ots -r 40
=============== generated snippet ==================
The Xapian Project
Welcome to the Xapian project website.
It's written in C++, with bindings to allow use from Perl, Python, PHP,
Java, Tcl, C# and Ruby (so far!)
Xapian is a highly adaptable toolkit which allows developers to
advanced indexing and search facilities to their own applications.
other website search solutions, Xapian's versatility allows you to extend
Omega to meet your needs as they grow.
The result is OK for this site (not for thoses with frames ...).
I would like to obtain something similar to the google "text snippets".
Advices are welcome?
N.B: all the HTML pages I'm indexing are converted to text with "elinks"
(the text browser)
like in the previous example.
Thanks in advance.
More information about the Xapian-discuss