Pull requests: CJK words and Snippet generator
James Aylett
james-xapian at tartarus.org
Wed Jul 27 23:22:55 BST 2016
On Tue, Jul 26, 2016 at 03:06:07PM +0200, rsto at paranoia.at wrote:
> The Cyrus IMAP mail server uses Xapian as search engine. Recently,
> FastMail has sponsored implementation of two Xapian features: CJK word
> splitting and a generator for search snippets. I've been working on both
> features and we would be happy to get them merged into Xapian master.
>
> Would you be interested in these features? Just let us know what would
> be required to get them merged. As a minimum I'd rebase the current
> forks against latest master. I'll be happy to answer any questions or
> change requests.
This sounds great! I know sufficiently little about CJK that I won't
try to comment on that at all :)
I think I'm right in saying that your snippet generator:
a. needs driving separately (so it's not integrated in the way
Xapian::MSet::snippet() is; is the intention that it replaced the
current snippet system as something more sophisticated?
I wonder if we can arrange suitable defaults to use your
implementation with the older API, and come up with a newer API that
allows a SnippetGenerator class to be used from the MSet.
(That might allow us to refactor the existing implementation and
provide both, if they have different strengths. I can't remember much
detail of the current one, offhand.)
b. only works with UTF8 (I assume that the pre_match & post_match
strings, and inter_snippet, should also be in UTF8?)
This probably just needs noting in the docstrings.
A good start would certainly be rebasing against master and opening a
pull request for each on github (this will trigger travis CI builds,
which is a helpful first pass in making sure everything good; it runs
against both G++ and Clang, which can expose some weirdnesses).
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-devel
mailing list