[Xapian-discuss] Xapian 1.0.0 released!
Ralf Mattes
rm at seid-online.de
Fri May 18 13:13:14 BST 2007
On Fri, 2007-05-18 at 12:42 +0100, Olly Betts wrote:
> In fact, while any UTF-8 string is trivially a valid ISO-8859-1 string,
> "real world" ISO-8859-1 doesn't look like valid UTF-8,
??? Could you explain? By "valid" you mean "won't roll on the floor
making silly noises"? Any UTF-8 string using characters with code points
> 127 _will_ have a binary representation different from the same string
encoded in ISO-8859-1 (all characters with code points > 127 will be
encoded with 2 octets).
> and our UTF-8
> handling code deals with invalid and overlong sequences by assuming
> they're really ISO-8859-1, so you can probably just feed in ISO-8859-1
> and it will be indexed magically converted to UTF-8. This hasn't been
> tested much though so test carefully before deploying.
This makes me feel slightly uneasy i have to say ... trying to guess an
encoding seems like a fast lane to insanity.
> > The UTF-8 support in normal php installations isn't very good.
>
> No, though the PHP iconv() function should be able to convert to/from
> UTF-8.
Since they seem to use good ol' libiconv i assume they work flawless.
Cheers and thanks for all the fine work
Ralf Mattes
> > And is it also possible to let Omega know we are feeding it
> > ISO-8859-15 and want that returned as well?
> >
> > Or are we required to supply those commands with UTF-8 data?
>
> At the moment you'll always get UTF-8 out of Omega.
>
> Cheers,
> Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
More information about the Xapian-discuss
mailing list