Adding czech stemmer to xapian?

Olly Betts olly at survex.com
Wed Oct 21 23:22:47 BST 2020


Hi Petr,

On Tue, Oct 20, 2020 at 10:18:32PM +0200, Petr Helebrant wrote:
> first of all thank you for creating and maintaining Xapian. I would like to
> ask if there was any possibility to add czech stemmer created on our
> technical university as a project. It's released here:
> 
> https://www.fit.vut.cz/research/product/133/.cs
> 
> 3 links at the bottom near "soubory" string.

It seems it's licensed as GPL?

We're very close to relicensing xapian-core as MPL so we want to avoid
incorporating additional code we can't relicense.

If you are able to relicense it, I'd encourage submitting it to Snowball
as then it's available to even more people, and we can easily import it
to Xapian from there.

Note the current home of the Snowball project is https://snowballstem.org/
- the site you're currently linking to is Martin Porter's old site for
it but he retired from development back in 2014.

Snowball uses the 3-clause BSD licence.  If you're not happy to use
that but are happy with MPL then we could include it in Xapian directly.
It'd be a bit awkward if someone then contributed a different Czech
stemmer to Snowball though.

If you can't relicense at all, then people could still use it by
subclassing Xapian::StemImplementation and having the subclass call your
stemmer, but it's obviously a lot less convenient than having
Xapian::Stem("cs") just work.

Cheers,
    Olly



More information about the Xapian-discuss mailing list