[Xapian-devel] My Introduction and Ideas
Chu Bingxiang
chu849238686 at foxmail.com
Thu Feb 27 01:03:27 GMT 2014
First, I should say thank you for your patient replay.
>Yes, in 2011 Dai Youli worked on implementing a segmentation algorithm
>for Chinese. We did a fairly thorough search first, and failed to find
>anything existing in C or C++ which had a suitable licence, so writing
>one seemed the only option. It's quite a challenging project for the
>GSoC timescale - it basically worked at the end, but there was quite a
>bit more to do.
>But shortly after that, someone submitted a patch for integration of
>the SCWS Chinese segmentation library - this had been relicensed
>under a more liberal licence since we'd looked for something suitable,
>so we hadn't considered it before:
>http://www.xunsearch.com/scws/
>This is a working segmentation algorithm (or at least I'm told it is - I
>don't understand Chinese well enough to tell for myself) and it's
>actively maintained, which certainly beats having to maintain our own.
>We've not managed to get the patch merged yet. I did start to work
>on cleaning it up, but then the author of the patch sent an updated
>version of the patch, but not based on my cleaned up version of the
>original, which rather put me off working on it for a while. Sadly
>I've not yet got back to it.
>The original patch and my cleaned up version are in this ticket,
>and I've just tracked down the newer patch and added that too:
>http://trac.xapian.org/ticket/594
Before I sent the first e-mail to the e-mail list, I have already know this "SCWS" project. But it seems a little unreliable. The recent commit was a month ago and that was a "web module" commit. Most of the commits are almost a year ago. And I did some test on the demo page, there are some errors whit the result. I hope we can make our own Chinese segmentation algorithm base on Dai Youli's work or a new one in the future. And if the work is too big, maybe we can continue to do it in the another GSoC season which was recommended by GSoC.
>The current status is far from ideal, and it would be good to move
>things forwards.
>We left off the "Improve Chinese Support" idea from the list this
>year because there didn't seem to be enough to occupy a student for
>3 months. We need to merge the changes from the newer patch and
>my cleaned up version of the older one, integrate everything nicely,
>and make sure there's test coverage and documentation for it.
>But if you'd like to work on that, you could combine that with something
>else unrelated to make a suitable sized project - there's no reason the
>project has to be all one thing.
>Documentation in languages other than English would be great to have,
>but translating documentation doesn't really fit with GSoC's rules:
>http://www.google->melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_pro>posals_f>or_documentation_work
>But if you (or anyone else) wants to work on translations outside of
>GSoC, I'd suggest the newer "Getting Started with Xapian" guide would be
>the best document to work on:
>http://getting-started-with-xapian.readthedocs.org/en/latest/
Yes, and actually I didn't want to do it in the GSoC season. I'm already working on this for some time. But my English is poor, so it could be a long way to go. And I noticed that in this year, there are many students from China, maybe two from Peking University,one from Fudan University and one in Canada now. In China, almost 80% of students who is learning Computer Scinece and 99% of all don't know what is Open Source. I hope we should work on pushing the development of the Open Source Project in China. And these words are for the students whose country's situation is like China. Also I hope Jiarong Wei and the other Chinese students can help with the translations.
Best Regards,
Chu Bingxiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140227/86c54720/attachment.html>
More information about the Xapian-devel
mailing list