[Xapian-devel] My Introduction and Ideas

Chu Bingxiang chu849238686 at foxmail.com
Thu Feb 27 01:03:27 GMT 2014


First, I should say thank you for your patient replay. 

>Yes, in 2011 Dai Youli worked on implementing a segmentation algorithm
>for Chinese.  We did a fairly thorough search first, and failed to find
>anything existing in C or C++ which had a suitable licence, so writing
>one seemed the only option.  It's quite a challenging project for the
>GSoC timescale - it basically worked at the end, but there was quite a
>bit more to do.

>But shortly after that, someone submitted a patch for integration of
>the SCWS Chinese segmentation library - this had been relicensed
>under a more liberal licence since we'd looked for something suitable,
>so we hadn't considered it before:

>http://www.xunsearch.com/scws/

>This is a working segmentation algorithm (or at least I'm told it is - I
>don't understand Chinese well enough to tell for myself) and it's
>actively maintained, which certainly beats having to maintain our own.

>We've not managed to get the patch merged yet.  I did start to work
>on cleaning it up, but then the author of the patch sent an updated
>version of the patch, but not based on my cleaned up version of the
>original, which rather put me off working on it for a while.  Sadly
>I've not yet got back to it.

>The original patch and my cleaned up version are in this ticket,
>and I've just tracked down the newer patch and added that too:

>http://trac.xapian.org/ticket/594

Before I sent the first e-mail to the e-mail list, I have already know this "SCWS" project. But it seems a little unreliable. The recent commit was a month ago and that was a "web module" commit. Most of the commits are almost a year ago. And I did some test on the demo page, there are some errors whit the result. I hope we can make our own Chinese segmentation algorithm base on Dai Youli's work or a new one in the future. And if the work is too big, maybe we can continue to do it in the another GSoC season which was recommended by GSoC.


>The current status is far from ideal, and it would be good to move
>things forwards.

>We left off the "Improve Chinese Support" idea from the list this
>year because there didn't seem to be enough to occupy a student for
>3 months.  We need to merge the changes from the newer patch and
>my cleaned up version of the older one, integrate everything nicely,
>and make sure there's test coverage and documentation for it.

>But if you'd like to work on that, you could combine that with something
>else unrelated to make a suitable sized project - there's no reason the
>project has to be all one thing.

>Documentation in languages other than English would be great to have,
>but translating documentation doesn't really fit with GSoC's rules:

>http://www.google->melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_pro>posals_f>or_documentation_work

>But if you (or anyone else) wants to work on translations outside of
>GSoC, I'd suggest the newer "Getting Started with Xapian" guide would be
>the best document to work on:

>http://getting-started-with-xapian.readthedocs.org/en/latest/

Yes, and actually I didn't want to do it in the GSoC season. I'm already working on this for some time. But my English is poor, so it could be a long way to go. And I noticed that in this year, there are many students from China, maybe two from Peking University,one from Fudan University and one in Canada now. In China, almost 80% of students who is learning Computer Scinece and 99% of all don't know what is Open Source. I hope we should work on pushing the development of the Open Source Project in China. And these words are for the students whose country's situation is like China. Also I hope Jiarong Wei and the other Chinese students can help with the translations.

Best Regards,
Chu Bingxiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140227/86c54720/attachment.html>


More information about the Xapian-devel mailing list