[Xapian-discuss] *wildcard* support?

Olly Betts olly at survex.com
Mon Oct 10 05:53:37 BST 2005


On Sun, Oct 09, 2005 at 01:46:07PM -0700, Eric Parusel wrote:
> Fb Fo Fb F@ Fm Fa Fr Fk Fe Ft F. Fc Fo Fm
> 
> [...]
> 
> pro) Doesn't use wildcards, but search uses phrases and positional
> indexes have to be used (do they have to be used in your above example?)

No, that was just using the n-grams to cull the lexicon down to a
managable candidate set, which we then check against the pattern.

I'd not considered using positional information and n-grams together
before.  That might work well actually.  I certainly wouldn't dismiss it
as a crazy scheme without trying it.

Another thing you can do with n-grams is to index a hash of the n-gram
rather than the n-gram itself, which helps keep the number of terms
down as "n" increases (especially if you're using unicode characters).
The hashing may add a few extra false positives, but you've got to
check for them anyway.

> Olly Betts wrote:
> > As for roadmaps, I'm afraid left truncation is low on my list of things
> > to work on.  Spelling correction overlaps somewhat and is higher but I'm
> > trying to concentrate on flint right now.  But if you want to work on it
> > yourself, I'm happy to give pointers on where to hook in to Xapian.  And
> > if done cleanly, I think it's something that's worth including in
> > Xapian.
> 
> If/when I am able to work on a spelling correction feature, I'll get in
> touch.

Just to clarify, by "it" I wasn't really meaning spelling correction,
but rather the whole area of indexing the lexicon using n-grams for
whatever purpose.

Cheers,
    Olly



More information about the Xapian-discuss mailing list