GSoC / utf-8 / 9P2000 / ndb(6)-like / Eric Lindblad
James Aylett
james-xapian at tartarus.org
Mon Feb 15 10:27:11 GMT 2016
Hi Eric!
On 15 Feb 2016, at 09:20, Eric Lindblad <geirfuglaps at yahoo.com> wrote:
> 2. Human Language Support
>
> Here is another [webpage Character Encoding (EUC-JP)] link regarding software using mecab.
> http://www.asahi-net.or.jp/~yw3t-trns/namazu/index.htm
It doesn’t seem very active, so I don’t know how useful pointing at a different search engine is actually going to be. (Also, it is licensed only as GPL, so we’d want to be careful not to allow any code to migrate across; although it looks like the code is in Perl, so that’s probably not a concern.) But if there’s something helpful in there then please do add it to the resources list for that project. A better link seems to be <http://www.namazu.org/>.
I have also noticed that the mecab link was outdated, and switched it to github, where it does seem moderately active.
> Here is an article by Rob Pike and Ken Thompson (Plan 9).
> http://doc.cat-v.org/plan_9/4th_edition/papers/utf
Is that the one you mean? It seems to be a paper justifying choosing Unicode over ISO 10646, which AIUI since Unicode 3.0 / 2000 is a moot distinction. The paper also covers using a UTF (UTF-8 in particular, although I guess at the time there wasn’t another one) over a UCS (which would have been UCS-2 at the time, now long abandoned by most people).
I agree though that we could do with something general about Unicode / UTF-8 usage in the documentation (which we can then link to from projects where this is relevant). We have per-language notes (such as https://getting-started-with-xapian.readthedocs.org/en/latest/language_specific.html#unicode), but nothing that I can recall that talks more generally about character sets, serialisation and so forth.
> 7. Applications
>
> It might be interesting to use Xapian with 9P2000.
Do you want to add a project to the wiki for this? It sounds like you have an idea of what it would look like, which at the moment I don’t (and I suspect a number of potential students wouldn’t either).
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-devel
mailing list