[Xapian-discuss] n-gram / cjk serializer

Olly Betts olly at survex.com
Wed Aug 20 01:54:13 BST 2008


On Tue, Aug 19, 2008 at 07:29:46PM +0000, Joss Shaw wrote:
> I've been trawling through the archives and I found reference to an
> n-gram query parser plugin which some guy made.  I don't think it's
> been included into the main Xapian distro yet but I would be really
> interested in such a tokenizer if there were plans!  

It's certainly something I'd like to include, but I don't have firm
plans for working on it myself currently.

> His tokenizer apparently plugs into Xapian, but I'm not sure how you
> plug extra query parsing engines in - could someone possibly shed some
> light on this for me please?

I've not studied this code myself, so if you want to know the true
answer to this and other questions concerning it, then you'll have to
read the source code or ask the author.

But since Xapian::QueryParser pretty much just uses public API methods,
I'd guess he's just implemented something similar but with different
handling for CJK characters.

Cheers,
    Olly



More information about the Xapian-discuss mailing list