[Xapian-discuss] Re: Japanese / UTF-8 support
Jeff Breidenbach
breidenbach at gmail.com
Thu Sep 7 03:55:26 BST 2006
> I've now pretty much done this. The worst gap is that there doesn't
> seem to be a PostScript to text convertor which handles anything above
> iso-8859-1.
Hmm... maybe go from postscript to pdf, the extract the text from
there. I don't think there are a lot of viable alternatives to ghostscipt.
Sorry I didn't dive in to help fast enough.
> Anyway, I'll create a "unicode" branch in SVN soon so people can try out
> the new code.
Cool! I bet there will be a number of tiny things to clean up, like cutting
offf summary indexes at N utf-8 characters instead of at N bytes. Nothing
that won't shake out pretty quickly.
Jeff
More information about the Xapian-discuss
mailing list