[Xapian-discuss] UTF-8: what is done and what is not?

tata 668 tata668 at gmail.com
Fri Nov 3 01:01:44 GMT 2006


Thanks for the reply!

But I wonder the same thing than some months ago:

Doesn't a UTF-8 queryparser useless until it uses the exact same word 
splitter than the one use for indexing the documents?

I'm really surprised I'm the only one with this problem...

Julien






----- Original Message ----- 
From: "Olly Betts" <olly at survex.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Thursday, November 02, 2006 7:49 PM
Subject: Re: [Xapian-discuss] UTF-8: what is done and what is not?


> On Thu, Nov 02, 2006 at 08:46:36AM -0500, tata 668 wrote:
>> I'm aware of the UTF-8 branch here:
>> http://www.oligarchy.co.uk/xapian/branches/utf8/ , but I'd like more
>> information about what it contains and if it's enough for me.
>
> The current status is summarised here:
>
> http://wiki.xapian.org/Utf8Support
>
> I'm in the process of turning the release handle for 0.9.8 (to fix
> various minor problems reported since 0.9.7), so I'm very close to
> merging the utf-8 branch in and the rate of visible progress should pick
> up.
>
>> Currently, I wrote my own word spliter to index the data and my own
>> queryparser. They are not perfect and I would like to use built-in Xapian
>> objects instead.
>
> There's not currently a word splitter in the core library, but
> Xapian::QueryParser now works in utf-8 on the branch, so you can
> probably use that now.
>
> I've not tested utf-8 from any of the bindings yet.  Some languages
> standardise on a particular internal representation, so there could
> be issues here (I don't know how PHP handles such issues).  But I'd
> certainly encourage you to try it and let us know if it works or if
> there are problems.
>
> Cheers,
>    Olly 




More information about the Xapian-discuss mailing list