[Xapian-discuss] how to display the results in html which we
r showing on konsole
Peter Karman
peter at peknet.com
Fri Mar 31 14:39:43 BST 2006
you may also want to try the HTML::Parser (or its ilk) module from CPAN.
It will handle more of the "crazy stuff".
Kevin SoftDev scribbled on 3/30/06 6:51 PM:
> #!/usr/bin/perl
>
> # I developed my own PERL routine that converts HTML to readable text format
>
>
> # that can be display from console. I do not guarantee that will take care
> or
>
> # all the crazy thing people write in their HTML but it should take of most
>
> # properly formatted HTML. Please go ahead and improve it if you see a place
>
>
> # for improvement and let me know, thanks ...
>
> # Kevin Duraj
>
> $body =~ s/<\!\-\-.*?\-\-\>//isg;
> $body =~ s/<title>.*?<\/title>//isg;
> $body =~ s/<style.*?>.*?<\/style>//isg;
> $body =~ s/<script.*?>.*?<\/script>//isg;
> $body =~ s/<.*?>/ /sg; $title =~ s/<.*?>/ /sg;
>
> $body =~
> s/(<|>| |<|>| |xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
> /sg;
> $title =~
> s/(<|>| |<|>| |xmp|plaintext|\s|\t|\n|\r|\'|\"|\,|\;|\~|\%|\|)+/
> /sg;
> #-----------^--- A0 ----#
> #This is not blank space#
> $body =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
> $title =~ s/(\!|\$|\.|_|\-|\*|\=|\~|\#)+/$1/sg;
>
> $body =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
> $title =~ s/(\!\s|\$\s|\.\s|_\s|\-\s|\*\s|\=\s|\~\s|\#)+/$1/sg;
>
> (@array) = split(/ /, $body);
> $body = '';
> foreach(@array) { $body .= $_ . ' '; }
>
> if(length($body) > 1024) { $body = substr($body, 1, 1024); }
> print $title. "\n";
> print $body . "\n";
>
>
>
> On 3/30/06, James Aylett <james-xapian at tartarus.org> wrote:
>> On Thu, Mar 30, 2006 at 01:51:44PM +0100, Olly Betts wrote:
>>
>>>> Ah - I see the text ones, so they just look like simpleton questions
>>>> :-)
>>> Can we just set mailman to convert HTML to text? HTML messages don't
>>> really add anything...
>> However that *removes* something if it's done poorly.
>>
>>> The option is "convert_html_to_plaintext" in the "Content filtering"
>>> section (and also make sure "filter_content" is enabled).
>> Any idea if this will just drop the HTML version if there's a text
>> version, or if it will squash the text alternate with its converted
>> version?
>>
>> To be honest, we could just enable filtering and have HTML parts
>> dropped. I've done this for xapian-discuss, I think - let's see what
>> happens.
>>
>> This means that people sending HTML-only will get junked, but unless
>> anyone speaks up about an MUA they use that simply can't generate text
>> parts I don't see that as being a problem.
>>
>> (Also means that attachments will be junked. Hooray. What's the MIME
>> type for patches again?)
>>
>> J
>>
>> --
>>
>> /--------------------------------------------------------------------------\
>> James Aylett xapian.org
>> james at tartarus.org uncertaintydivision.org
>>
>> _______________________________________________
>> Xapian-discuss mailing list
>> Xapian-discuss at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
--
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss
mailing list