[Xapian-discuss] Re: Japanese / UTF-8 support

Jeff Breidenbach breidenbach at gmail.com
Thu Sep 7 03:55:26 BST 2006


> I've now pretty much done this.  The worst gap is that there doesn't
> seem to be a PostScript to text convertor which handles anything above
> iso-8859-1.

Hmm... maybe go from postscript to pdf, the extract the text from
there. I don't think there are a lot of viable alternatives to ghostscipt.
Sorry I didn't dive in to help fast enough.

> Anyway, I'll create a "unicode" branch in SVN soon so people can try out
> the new code.

Cool! I bet there will be a number of tiny things to clean up, like cutting
offf summary indexes at N utf-8 characters instead of at N bytes. Nothing
that won't shake out pretty quickly.

Jeff



More information about the Xapian-discuss mailing list