[Xapian-discuss] *wildcard* support?
Olly Betts
olly at survex.com
Mon Oct 10 05:53:37 BST 2005
On Sun, Oct 09, 2005 at 01:46:07PM -0700, Eric Parusel wrote:
> Fb Fo Fb F@ Fm Fa Fr Fk Fe Ft F. Fc Fo Fm
>
> [...]
>
> pro) Doesn't use wildcards, but search uses phrases and positional
> indexes have to be used (do they have to be used in your above example?)
No, that was just using the n-grams to cull the lexicon down to a
managable candidate set, which we then check against the pattern.
I'd not considered using positional information and n-grams together
before. That might work well actually. I certainly wouldn't dismiss it
as a crazy scheme without trying it.
Another thing you can do with n-grams is to index a hash of the n-gram
rather than the n-gram itself, which helps keep the number of terms
down as "n" increases (especially if you're using unicode characters).
The hashing may add a few extra false positives, but you've got to
check for them anyway.
> Olly Betts wrote:
> > As for roadmaps, I'm afraid left truncation is low on my list of things
> > to work on. Spelling correction overlaps somewhat and is higher but I'm
> > trying to concentrate on flint right now. But if you want to work on it
> > yourself, I'm happy to give pointers on where to hook in to Xapian. And
> > if done cleanly, I think it's something that's worth including in
> > Xapian.
>
> If/when I am able to work on a spelling correction feature, I'll get in
> touch.
Just to clarify, by "it" I wasn't really meaning spelling correction,
but rather the whole area of indexing the lexicon using n-grams for
whatever purpose.
Cheers,
Olly
More information about the Xapian-discuss
mailing list