Pull requests: CJK words and Snippet generator

James Aylett james-xapian at tartarus.org
Fri Jul 29 12:45:06 BST 2016


On Fri, Jul 29, 2016 at 12:12:25PM +0200, rsto at paranoia.at wrote:

> On Thu, Jul 28, 2016, at 00:22, James Aylett wrote:
> > This sounds great! I know sufficiently little about CJK that I won't
> > try to comment on that at all :)
> 
> I've just opened a pull request for the CJK tokenizer:
> https://github.com/xapian/xapian/pull/114

That's great, thanks.

> > I wonder if we can arrange suitable defaults to use your
> > implementation with the older API, and come up with a newer API that
> > allows a SnippetGenerator class to be used from the MSet.
> 
> The FastMail snippet generator has been written when MSet didn't create
> snippets. I'll first compare both implementations to see if there is a
> good reason for them to coexist, or might just as well merge any
> additional features into MSet.

Terrific, thank you.

> Unfortunately, Travis breaks since pkg-config can't find libicu on the
> machine [1]. I could make the libicu dependency optional, and that might
> be useful for Xapian installation that don't bother with CJK text, but
> for Travis tests it would make sense to enable ICU.

You should be able to install it; if you add libicu-dev to the
packages stanza in .travis.yml it will put it in there.

However you seem to be using pkg-config, which Ubuntu 12.04 LTS (which
travis currently uses) doesn't provide for libicu. 14.04 LTS does, and
it's possible to use that as a beta with travis, I think by changing:

sudo: false

to:

sudo: required
dist: trusty

That will run about a minute slower than the current builds, but
that's not a huge problem for the volume we're dealing with. I'd hope
that they move 14.04 out of beta soon (at which point it should be
possible to use with container builds, which are faster), since 12.04
only has support until next year.

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list