[Xapian-discuss] encoding?

Gupteshwar Joshi gupteshwar.joshi at gmail.com
Mon Apr 3 17:42:29 BST 2006


Hello,
Sorry for mistake i had done ,Iwill be more responsible in the future.






On 4/1/06, Olly Betts <olly at survex.com> wrote:
>
> Please don't send essentially the same message to the list multiple
> times (less than 90 minutes apart too!)  And don't cc: individual
> developers - we all read the list so you'll just annoy people and be
> less likely to get a useful answer.  Overall, remember this mailing list
> is a free resource, and nobody is under any obligation to help you.  So
> if you want help, play nicely and respect the other list members.
>
> On Fri, Mar 31, 2006 at 06:28:37PM +0530, Gupteshwar Joshi wrote:
> > Does omega supports different kind of encodings for searching the the
> > indexed data .
>
> Currently Omega doesn't perform any character encoding conversions.
> So if you're trying to handle a non-latin language, you'll probably
> be disappointed.
>
> > I have applied the indexing on all the documents of english+devnagari
> > language.
>
> Sorry, I don't know what encoding devnagari requires.


Devanagari requires unicode UTF-8(\U0900 - \U097F)

> It  does work without prompting any error if i consider that my local data
> > too is indexed then it is not showing any reult for devnagari key .
>
> Assuming devnagari uses a non-latin character set, then the word
> tokeniser won't tokenise devnagari words correctly (or at all in fact).
>
> The plan for Xapian 1.0 is to fix Omega to convert everything to utf-8
> and use unicode definitions of what is a word character, etc.  Then this
> should all work.
>
> Meanwhile, if you're prepared to write your own indexer (or at least
> your own word tokeniser), then there's a patch to make the QueryParser
> utf-8 aware (which is what the gmane search uses).


How can i get help from them for that parser?



> >      I have attached meta tag for encoding type in head query template
> but
> > still it doesnt searching for those key words.
>
> Well, that only tells the browser what character set the output is in so
> it's not going to affect the searching.
>
> Incidentally, a slightly better approach than a meta tag is to set the
> charset in the Content-Type: header of the response by adding something
> like this to the top of the query template:
>
> $httpheader{Content-Type,text/html; charset=utf-8}
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list