[Xapian-discuss] Document snippet generation
colinabell at gmail.com
Tue Mar 18 14:15:35 GMT 2008
Following on from a discussion that was flying around a while back
about document snippets (summaries). I have knocked together some
proof of concept code (C++) that uses the Xapian stemming ability and
sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction)
. I also used the Open Text Summarizer project as an inspiration.
It works quite well, but has some caveats which are explained in the
code comments. It can summarise, highlight sentences and highlight
words. It also has the ability to do context summaries. For example:
If you supply it with terms it will summarise the text within the
context of those terms.
I am new to C++ programming so while your laughing out loud at the
poor coding, please keep that in mind. The code was assembled on an
Ubuntu Linux and comes with a Makefile. I have also supplied my
stopper class. For some reason the stopper still fails to stop some of
the words in the stopper (like "the") if anyone knows why, please let
Feedback / comments / changes / improvements are more than welcome -
bring it on. I really hope this sparks an interest.
-------------- next part --------------
On 9 Feb 2008, at 01:24, Kevin Duraj wrote:
> The Open Text Summarizer looks pretty good. Perhaps it could be use to
> fight spamdexing
> and keyword stuffing. I am wondering how it works? Is it based on
> natural language processing ?
> Kevin Duraj
> On Jan 25, 2008 10:26 PM, Bogdan M. Maryniuk <bogdan.maryniuk at gmail.com
> > wrote:
>> On Jan 26, 2008 3:04 AM, Peter Karman <peter at peknet.com> wrote:
>>> The HiLiter and Snipper can be used with any text.
>> Oh, sorry... I read as Hitler and Sniper... :-)
>> Xapian-discuss mailing list
>> Xapian-discuss at lists.xapian.org
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
More information about the Xapian-discuss