[Xapian-discuss] bigrams and co-occurrence matrix

Ying Liu liux0395 at umn.edu
Tue Oct 27 14:50:43 GMT 2009


Hi Yung-chung,

Thanks for your reply. I download the cjk-tokenizer from CPAN at 
http://search.cpan.org/~xern/Lingua-CJK-Tokenizer-0.01/lib/Lingua/CJK/Tokenizer.pm. 
It has a prerequisite libunicode by Tom Tromey. I don't find this module 
on CPAN. What should I install to make the cjk-tokenizer module work?

Thanks,
Ying

☼ 林永忠 ☼ (Yung-chung Lin) wrote:
> Hi Ying,
>
> You may check this http://code.google.com/p/cjk-tokenizer/
> A perl binding is also included.
>
> Best,
> Yung-chung Lin
>
>
> 2009/10/26 Ying Liu <liux0395 at umn.edu <mailto:liux0395 at umn.edu>>
>
>     Hello all,
>
>     I want to work out a solution to counting bigrams and creating a
>     co-occurrence matix with Xapian Perl modules. By check archived
>     emails, there are some discussions about CJK tokens. I am just
>     working on English documents. My immediate goals are how Xapian do
>     bigrams and how can it do that with windowing, like NSP does with
>     the -- window option. Did anyone work on this before? Do you have
>     some suggestions?
>
>     Thank you,
>     Ying
>
>
>     _______________________________________________
>     Xapian-discuss mailing list
>     Xapian-discuss at lists.xapian.org
>     <mailto:Xapian-discuss at lists.xapian.org>
>     http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>




More information about the Xapian-discuss mailing list