[Xapian-discuss] Input files and special chars and spaces

Olly Betts olly at survex.com
Sat Sep 10 01:49:36 BST 2005


On Fri, Sep 09, 2005 at 07:55:25PM +0200, Floris Bos wrote:
> When I enter a text that contains < or > the text is chopped right before 
> the first occurence of one of these chars. I figured out that it's 
> $htmlstrip that does this, am I right here? It would not be a very big 
> problem if this is the case but I want to be sure that this problem doesn't 
> has another cause.

Well, "$htmlstrip{foo<bar}" gives "foo" (because "<bar" is assumed to be
an unclosed tag).

But "$htmlstrip{foo>bar}" gives "foobar" (whereas you say that it chops
off too).  So perhaps I'm misunderstanding you.  If so, can you post
an actual example showing the input and output.

> So is it standard that $htmlstrip chops of the text before < or >? Is there 
> a way to work round this and still remove the < and > chars?

If you have text which isn't HTML but may contain "<" and ">", then
$htmlstrip isn't really appropriate.  You should probably use $html
instead which will convert "<" to "&lt;", ">" to "&gt;", "&" to "&amp;"
and '"' to "&quot;".

Cheers,
    Olly



More information about the Xapian-discuss mailing list