[Xapian-discuss] bigrams and co-occurrence matrix
Ying Liu
liux0395 at umn.edu
Tue Oct 27 14:50:43 GMT 2009
Hi Yung-chung,
Thanks for your reply. I download the cjk-tokenizer from CPAN at
http://search.cpan.org/~xern/Lingua-CJK-Tokenizer-0.01/lib/Lingua/CJK/Tokenizer.pm.
It has a prerequisite libunicode by Tom Tromey. I don't find this module
on CPAN. What should I install to make the cjk-tokenizer module work?
Thanks,
Ying
☼ 林永忠 ☼ (Yung-chung Lin) wrote:
> Hi Ying,
>
> You may check this http://code.google.com/p/cjk-tokenizer/
> A perl binding is also included.
>
> Best,
> Yung-chung Lin
>
>
> 2009/10/26 Ying Liu <liux0395 at umn.edu <mailto:liux0395 at umn.edu>>
>
> Hello all,
>
> I want to work out a solution to counting bigrams and creating a
> co-occurrence matix with Xapian Perl modules. By check archived
> emails, there are some discussions about CJK tokens. I am just
> working on English documents. My immediate goals are how Xapian do
> bigrams and how can it do that with windowing, like NSP does with
> the -- window option. Did anyone work on this before? Do you have
> some suggestions?
>
> Thank you,
> Ying
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> <mailto:Xapian-discuss at lists.xapian.org>
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
More information about the Xapian-discuss
mailing list