[Xapian-discuss] how to display the results in html which we r showing on konsole

Peter Karman peter at peknet.com
Fri Mar 31 14:39:43 BST 2006


you may also want to try the HTML::Parser (or its ilk) module from CPAN. 
It will handle more of the "crazy stuff".

Kevin SoftDev scribbled on 3/30/06 6:51 PM:
> #!/usr/bin/perl
> 
> # I developed my own PERL routine that converts HTML to readable text format
> 
> 
> # that can be display from console. I do not guarantee that will take care
> or
> 
> # all the crazy thing people write in their HTML but it should take of most
> 
> # properly formatted HTML. Please go ahead and improve it if you see a place
> 
> 
> # for improvement and let me know, thanks ...
> 
> # Kevin Duraj
> 
>     $body =~ s/<\!\-\-.*?\-\-\>//isg;
>     $body =~ s/<title>.*?<\/title>//isg;
>     $body =~ s/<style.*?>.*?<\/style>//isg;
>     $body =~ s/<script.*?>.*?<\/script>//isg;
>     $body =~ s/<.*?>/ /sg; $title =~ s/<.*?>/ /sg;
> 
>     $body =~
> s/(<|>| |&lt|&gt|&nbsp|xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
> /sg;
>     $title =~
> s/(<|>| |&lt|&gt|&nbsp|xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
> /sg;
>          #-----------^--- A0 ----#
>          #This is not blank space#
>     $body  =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
>     $title =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
> 
>     $body  =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
>     $title =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
> 
>     (@array) = split(/ /, $body);
>     $body = '';
>     foreach(@array) { $body .= $_ . ' '; }
> 
>     if(length($body) > 1024) { $body = substr($body, 1, 1024); }
>      print $title. "\n";
>      print $body . "\n";
> 
> 
> 
> On 3/30/06, James Aylett <james-xapian at tartarus.org> wrote:
>> On Thu, Mar 30, 2006 at 01:51:44PM +0100, Olly Betts wrote:
>>
>>>> Ah - I see the text ones, so they just look like simpleton questions
>>>> :-)
>>> Can we just set mailman to convert HTML to text?  HTML messages don't
>>> really add anything...
>> However that *removes* something if it's done poorly.
>>
>>> The option is "convert_html_to_plaintext" in the "Content filtering"
>>> section (and also make sure "filter_content" is enabled).
>> Any idea if this will just drop the HTML version if there's a text
>> version, or if it will squash the text alternate with its converted
>> version?
>>
>> To be honest, we could just enable filtering and have HTML parts
>> dropped. I've done this for xapian-discuss, I think - let's see what
>> happens.
>>
>> This means that people sending HTML-only will get junked, but unless
>> anyone speaks up about an MUA they use that simply can't generate text
>> parts I don't see that as being a problem.
>>
>> (Also means that attachments will be junked. Hooray. What's the MIME
>> type for patches again?)
>>
>> J
>>
>> --
>>
>> /--------------------------------------------------------------------------\
>> James Aylett                                                  xapian.org
>> james at tartarus.org                               uncertaintydivision.org
>>
>> _______________________________________________
>> Xapian-discuss mailing list
>> Xapian-discuss at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the Xapian-discuss mailing list