[Xapian-discuss] how to display the results in html which we r showing on konsole

Kevin SoftDev kevin.softdev at gmail.com
Fri Mar 31 01:51:02 BST 2006


#!/usr/bin/perl

# I developed my own PERL routine that converts HTML to readable text format


# that can be display from console. I do not guarantee that will take care
or

# all the crazy thing people write in their HTML but it should take of most

# properly formatted HTML. Please go ahead and improve it if you see a place


# for improvement and let me know, thanks ...

# Kevin Duraj

    $body =~ s/<\!\-\-.*?\-\-\>//isg;
    $body =~ s/<title>.*?<\/title>//isg;
    $body =~ s/<style.*?>.*?<\/style>//isg;
    $body =~ s/<script.*?>.*?<\/script>//isg;
    $body =~ s/<.*?>/ /sg; $title =~ s/<.*?>/ /sg;

    $body =~
s/(<|>| |&lt|&gt|&nbsp|xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
/sg;
    $title =~
s/(<|>| |&lt|&gt|&nbsp|xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
/sg;
         #-----------^--- A0 ----#
         #This is not blank space#
    $body  =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
    $title =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;

    $body  =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
    $title =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;

    (@array) = split(/ /, $body);
    $body = '';
    foreach(@array) { $body .= $_ . ' '; }

    if(length($body) > 1024) { $body = substr($body, 1, 1024); }
     print $title. "\n";
     print $body . "\n";



On 3/30/06, James Aylett <james-xapian at tartarus.org> wrote:
>
> On Thu, Mar 30, 2006 at 01:51:44PM +0100, Olly Betts wrote:
>
> > > Ah - I see the text ones, so they just look like simpleton questions
> > > :-)
> >
> > Can we just set mailman to convert HTML to text?  HTML messages don't
> > really add anything...
>
> However that *removes* something if it's done poorly.
>
> > The option is "convert_html_to_plaintext" in the "Content filtering"
> > section (and also make sure "filter_content" is enabled).
>
> Any idea if this will just drop the HTML version if there's a text
> version, or if it will squash the text alternate with its converted
> version?
>
> To be honest, we could just enable filtering and have HTML parts
> dropped. I've done this for xapian-discuss, I think - let's see what
> happens.
>
> This means that people sending HTML-only will get junked, but unless
> anyone speaks up about an MUA they use that simply can't generate text
> parts I don't see that as being a problem.
>
> (Also means that attachments will be junked. Hooray. What's the MIME
> type for patches again?)
>
> J
>
> --
>
> /--------------------------------------------------------------------------\
> James Aylett                                                  xapian.org
> james at tartarus.org                               uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list