[Xapian-discuss] how to display the results in html which we r
showing on konsole
Kevin SoftDev
kevin.softdev at gmail.com
Fri Mar 31 01:51:02 BST 2006
#!/usr/bin/perl
# I developed my own PERL routine that converts HTML to readable text format
# that can be display from console. I do not guarantee that will take care
or
# all the crazy thing people write in their HTML but it should take of most
# properly formatted HTML. Please go ahead and improve it if you see a place
# for improvement and let me know, thanks ...
# Kevin Duraj
$body =~ s/<\!\-\-.*?\-\-\>//isg;
$body =~ s/<title>.*?<\/title>//isg;
$body =~ s/<style.*?>.*?<\/style>//isg;
$body =~ s/<script.*?>.*?<\/script>//isg;
$body =~ s/<.*?>/ /sg; $title =~ s/<.*?>/ /sg;
$body =~
s/(<|>| |<|>| |xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
/sg;
$title =~
s/(<|>| |<|>| |xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
/sg;
#-----------^--- A0 ----#
#This is not blank space#
$body =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
$title =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
$body =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
$title =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
(@array) = split(/ /, $body);
$body = '';
foreach(@array) { $body .= $_ . ' '; }
if(length($body) > 1024) { $body = substr($body, 1, 1024); }
print $title. "\n";
print $body . "\n";
On 3/30/06, James Aylett <james-xapian at tartarus.org> wrote:
>
> On Thu, Mar 30, 2006 at 01:51:44PM +0100, Olly Betts wrote:
>
> > > Ah - I see the text ones, so they just look like simpleton questions
> > > :-)
> >
> > Can we just set mailman to convert HTML to text? HTML messages don't
> > really add anything...
>
> However that *removes* something if it's done poorly.
>
> > The option is "convert_html_to_plaintext" in the "Content filtering"
> > section (and also make sure "filter_content" is enabled).
>
> Any idea if this will just drop the HTML version if there's a text
> version, or if it will squash the text alternate with its converted
> version?
>
> To be honest, we could just enable filtering and have HTML parts
> dropped. I've done this for xapian-discuss, I think - let's see what
> happens.
>
> This means that people sending HTML-only will get junked, but unless
> anyone speaks up about an MUA they use that simply can't generate text
> parts I don't see that as being a problem.
>
> (Also means that attachments will be junked. Hooray. What's the MIME
> type for patches again?)
>
> J
>
> --
>
> /--------------------------------------------------------------------------\
> James Aylett xapian.org
> james at tartarus.org uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list