GSoC / utf-8 / 9P2000 / ndb(6)-like / Eric Lindblad

James Aylett james-xapian at tartarus.org
Mon Feb 15 10:27:11 GMT 2016


Hi Eric!

On 15 Feb 2016, at 09:20, Eric Lindblad <geirfuglaps at yahoo.com> wrote:

> 2. Human Language Support
> 
> Here is another [webpage Character Encoding (EUC-JP)] link regarding software using mecab.
> http://www.asahi-net.or.jp/~yw3t-trns/namazu/index.htm

It doesn’t seem very active, so I don’t know how useful pointing at a different search engine is actually going to be. (Also, it is licensed only as GPL, so we’d want to be careful not to allow any code to migrate across; although it looks like the code is in Perl, so that’s probably not a concern.) But if there’s something helpful in there then please do add it to the resources list for that project. A better link seems to be <http://www.namazu.org/>.

I have also noticed that the mecab link was outdated, and switched it to github, where it does seem moderately active.

> Here is an article by Rob Pike and Ken Thompson (Plan 9).
> http://doc.cat-v.org/plan_9/4th_edition/papers/utf

Is that the one you mean? It seems to be a paper justifying choosing Unicode over ISO 10646, which AIUI since Unicode 3.0 / 2000 is a moot distinction. The paper also covers using a UTF (UTF-8 in particular, although I guess at the time there wasn’t another one) over a UCS (which would have been UCS-2 at the time, now long abandoned by most people).

I agree though that we could do with something general about Unicode / UTF-8 usage in the documentation (which we can then link to from projects where this is relevant). We have per-language notes (such as https://getting-started-with-xapian.readthedocs.org/en/latest/language_specific.html#unicode), but nothing that I can recall that talks more generally about character sets, serialisation and so forth.

> 7. Applications
> 
> It might be interesting to use Xapian with 9P2000.

Do you want to add a project to the wiki for this? It sounds like you have an idea of what it would look like, which at the moment I don’t (and I suspect a number of potential students wouldn’t either).

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-devel mailing list