[Xapian-discuss] UTF-8: what is done and what is not?

tata 668 tata668 at gmail.com
Thu Nov 2 13:46:36 GMT 2006


I'm aware of the UTF-8 branch here: 
http://www.oligarchy.co.uk/xapian/branches/utf8/ , but I'd like more 
information about what it contains and if it's enough for me.

I not not using omega, nor stemmers. I use Xapian from PHP as a replacement 
to MySQL full text search. The data submited by users is indexed live in 
Xapian. Users can search the data, on the search page, using (currently) 
limited syntax:

[ word1 ] => return data containing "word1 "
[ word1 word2 ] => return data containing "word1" AND "word2"
[ "word1 word2" ] => return data containing the phrase "word1 word2"

No fancy stemmers, no result scoring, etc. But I do need to support UTF-8.

Currently, I wrote my own word spliter to index the data and my own 
queryparser. They are not perfect and I would like to use built-in Xapian 
objects instead.

So what exactly is available now in Xapian?

This is a thread I started some time ago: 
http://lists.tartarus.org/pipermail/xapian-discuss/2006-February/001674.html 
about the same topic.

Sorry in advance if you already answered those questions!

And thanks again for your really appreciated work.

JL 




More information about the Xapian-discuss mailing list