[Xapian-discuss] Document snippet generation

Colin Bell colinabell at gmail.com
Tue Mar 18 14:15:35 GMT 2008


Hi All

Following on from a discussion that was flying around a while back  
about document snippets (summaries). I have knocked together some  
proof of concept code (C++) that uses the Xapian stemming ability and  
sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction) 
. I also used the Open Text Summarizer project as an inspiration.

It works quite well, but has some caveats which are explained in the  
code comments. It can summarise, highlight sentences and highlight  
words. It also has the ability to do context summaries. For example:  
If you supply it with terms it will summarise the text within the  
context of those terms.

I am new to C++ programming so while your laughing out loud at the  
poor coding, please keep that in mind. The code was assembled on an  
Ubuntu Linux and comes with a Makefile. I have also supplied my  
stopper class. For some reason the stopper still fails to stop some of  
the words in the stopper (like "the") if anyone knows why, please let  
me know.

Feedback / comments / changes / improvements are more than welcome -  
bring it on. I really hope this sparks an interest.

Regards

Colin
-------------- next part --------------


On 9 Feb 2008, at 01:24, Kevin Duraj wrote:

> The Open Text Summarizer looks pretty good. Perhaps it could be use to
> fight spamdexing
> and keyword stuffing. I am wondering how it works? Is it based on
> natural language processing ?
>
>
>  Kevin Duraj
>  http://UncensoredWebSearch.com
>
>
> On Jan 25, 2008 10:26 PM, Bogdan M. Maryniuk <bogdan.maryniuk at gmail.com 
> > wrote:
>> On Jan 26, 2008 3:04 AM, Peter Karman <peter at peknet.com> wrote:
>>> http://search.cpan.org/dist/Search-Tools/
>>> The HiLiter and Snipper can be used with any text.
>>
>> Oh, sorry... I read as Hitler and Sniper... :-)
>>
>> --
>> bm
>>
>>
>> _______________________________________________
>> Xapian-discuss mailing list
>> Xapian-discuss at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss



More information about the Xapian-discuss mailing list