[Xapian-devel] [GSOC 2011] Improving Spell Checker
Olly Betts
olly at survex.com
Thu Mar 24 14:35:20 GMT 2011
On Wed, Mar 23, 2011 at 07:42:08PM -0400, Prasad Prabhu wrote:
> This is my list set of ideas and overview of my analysis I have done on some
> other ideas I felt should be discussed. Please provide me some comments and
> suggestions to make it better before the application process starts.
> Here is the link: Idea Log <http://goo.gl/GjCcA>
I think trying to actually parse queries as sentences isn't likely to
work well. People usually search for a few words without the proper
grammar, or for a sentence fragment. So for being context sensitive,
I think a statistical approach is more likely to work (e.g. something
like tracking how likely is this word to appear near that one, and
then comparing that for words within edit distance X of the word we
are considering for correction).
I'm not clear how stemming helps here - perhaps you could elaborate
on how it would be used?
And soundex is really a non-starter. It's only intended to be used
on surnames common in the USA, and it's not even much good for those.
Metaphone (and metaphone 2) are better alternatives.
Cheers,
Olly
More information about the Xapian-devel
mailing list