[Xapian-discuss] xapian supports Chinese language

Olly Betts olly at survex.com
Thu Apr 9 01:06:16 BST 2009


On Wed, Apr 08, 2009 at 09:19:55PM +0800, LiYong wrote:
> First I want to modify the simple examples. Based on my understanding, the 
> simpleindex.cc uses index_text function to add the input. However, I do not 
> know how to recognize the input word by word. For example, if the input is 
> "This is a test", the API should add these words this, is, a and test. Is it 
> using the space to split them?

Yes.

> Chinese sentence does not contain space; I 
> want to use the cjk lib to split the Chinese English mixed input. If I use 
> the add_posting function, I have to split the input using the cjk lib and 
> the pass the spited word to the app_posting function.

Yes.

> However, If the input is html format, I have to parse the html tag;
> can I modify some functions in the Omega application?

Yes - change the code which currently uses Xapian::TermGenerator (for
indexing), and the code which currently uses Xapian::QueryParser (for
searching).

Cheers,
    Olly



More information about the Xapian-discuss mailing list