[Xapian-discuss] xapian supports Chinese language
Olly Betts
olly at survex.com
Thu Apr 9 01:06:16 BST 2009
On Wed, Apr 08, 2009 at 09:19:55PM +0800, LiYong wrote:
> First I want to modify the simple examples. Based on my understanding, the
> simpleindex.cc uses index_text function to add the input. However, I do not
> know how to recognize the input word by word. For example, if the input is
> "This is a test", the API should add these words this, is, a and test. Is it
> using the space to split them?
Yes.
> Chinese sentence does not contain space; I
> want to use the cjk lib to split the Chinese English mixed input. If I use
> the add_posting function, I have to split the input using the cjk lib and
> the pass the spited word to the app_posting function.
Yes.
> However, If the input is html format, I have to parse the html tag;
> can I modify some functions in the Omega application?
Yes - change the code which currently uses Xapian::TermGenerator (for
indexing), and the code which currently uses Xapian::QueryParser (for
searching).
Cheers,
Olly
More information about the Xapian-discuss
mailing list