[Xapian-discuss] Input files and special chars and spaces
Olly Betts
olly at survex.com
Sat Sep 10 01:49:36 BST 2005
On Fri, Sep 09, 2005 at 07:55:25PM +0200, Floris Bos wrote:
> When I enter a text that contains < or > the text is chopped right before
> the first occurence of one of these chars. I figured out that it's
> $htmlstrip that does this, am I right here? It would not be a very big
> problem if this is the case but I want to be sure that this problem doesn't
> has another cause.
Well, "$htmlstrip{foo<bar}" gives "foo" (because "<bar" is assumed to be
an unclosed tag).
But "$htmlstrip{foo>bar}" gives "foobar" (whereas you say that it chops
off too). So perhaps I'm misunderstanding you. If so, can you post
an actual example showing the input and output.
> So is it standard that $htmlstrip chops of the text before < or >? Is there
> a way to work round this and still remove the < and > chars?
If you have text which isn't HTML but may contain "<" and ">", then
$htmlstrip isn't really appropriate. You should probably use $html
instead which will convert "<" to "<", ">" to ">", "&" to "&"
and '"' to """.
Cheers,
Olly
More information about the Xapian-discuss
mailing list