[Xapian-discuss] Xapian 1.0.0 released!

Ralf Mattes rm at seid-online.de
Fri May 18 13:13:14 BST 2007


On Fri, 2007-05-18 at 12:42 +0100, Olly Betts wrote:

> In fact, while any UTF-8 string is trivially a valid ISO-8859-1 string,
> "real world" ISO-8859-1 doesn't look like valid UTF-8, 

??? Could you explain? By "valid" you mean "won't roll on the floor
making silly noises"? Any UTF-8 string using characters with code points
> 127 _will_ have a binary representation different from the same string
encoded in ISO-8859-1 (all characters with code points > 127 will be
encoded with 2 octets).

> and our UTF-8
> handling code deals with invalid and overlong sequences by assuming
> they're really ISO-8859-1, so you can probably just feed in ISO-8859-1
> and it will be indexed magically converted to UTF-8.  This hasn't been
> tested much though so test carefully before deploying.

This makes me feel slightly uneasy i have to say ... trying to guess an
encoding seems like a fast lane to insanity.

> > The UTF-8 support in normal php installations isn't very good.
> 
> No, though the PHP iconv() function should be able to convert to/from
> UTF-8.

Since they seem to use good ol' libiconv i assume they work flawless.


Cheers and thanks for all the fine work

 Ralf Mattes

> > And is it also possible to let Omega know we are feeding it 
> > ISO-8859-15 and want that returned as well?
> >
> > Or are we required to supply those commands with UTF-8 data?
> 
> At the moment you'll always get UTF-8 out of Omega.
> 
> Cheers,
>     Olly
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss




More information about the Xapian-discuss mailing list