[Xapian-discuss] Input files and special chars and spaces

Floris Bos flbos at hotmail.com
Sat Sep 10 10:46:22 BST 2005


Thank you, using $html instead of $htmlstrip solved the problem!


>From: Olly Betts <olly at survex.com>
>To: Floris Bos <flbos at hotmail.com>
>CC: xapian-discuss at lists.xapian.org
>Subject: Re: [Xapian-discuss] Input files and special chars and spaces
>Date: Sat, 10 Sep 2005 01:49:36 +0100
>
>On Fri, Sep 09, 2005 at 07:55:25PM +0200, Floris Bos wrote:
> > When I enter a text that contains < or > the text is chopped right 
>before
> > the first occurence of one of these chars. I figured out that it's
> > $htmlstrip that does this, am I right here? It would not be a very big
> > problem if this is the case but I want to be sure that this problem 
>doesn't
> > has another cause.
>
>Well, "$htmlstrip{foo<bar}" gives "foo" (because "<bar" is assumed to be
>an unclosed tag).
>
>But "$htmlstrip{foo>bar}" gives "foobar" (whereas you say that it chops
>off too).  So perhaps I'm misunderstanding you.  If so, can you post
>an actual example showing the input and output.
>
> > So is it standard that $htmlstrip chops of the text before < or >? Is 
>there
> > a way to work round this and still remove the < and > chars?
>
>If you have text which isn't HTML but may contain "<" and ">", then
>$htmlstrip isn't really appropriate.  You should probably use $html
>instead which will convert "<" to "&lt;", ">" to "&gt;", "&" to "&amp;"
>and '"' to "&quot;".
>
>Cheers,
>     Olly





More information about the Xapian-discuss mailing list