[Xapian-devel] GSoC 2011: Improve Spelling Correction

Olly Betts olly at survex.com
Mon Mar 21 15:05:16 GMT 2011


On Sun, Mar 20, 2011 at 07:53:56PM +0500, Nikita Smetanin wrote:
> Hello, I am Nikita Smetanin (ntz), russian student. I'm interested in
> fuzzy search algorithms (also known as similarity search and spelling
> correction), I have some articles and open-source implementations of
> related algorithms. I also have good experience in enterprise software
> development (Java/C++/C# and related stuff) and in small projects.
> 
> I want to work on your project "Improve spelling correction", but I
> want to suggest some additions to that project:

That's cool - I actually added a new sentence to the ideas page earlier
to make this clearer (http://trac.xapian.org/wiki/GSoCProjectIdeas):

    Note that these are ideas - some are more fully formed than others, but
    don't be afraid to take them and extend or adapt them in your proposal
    to produce something you're more interesting in working on.

> - One or several phonetic matching algorithms to improve name and
> surname search.

How would you apply these?  Just as something which could be applied to
a field known to contain a name (e.g. author) or something more complex?

> - Alternative faster (than trigram) algorithm for correction candidate search.
> - More complicated word distance metric to improve result set relevance.
> - Something about improving stemming quality.
> - Language detection for automatic language-specific algorithms selection.
> 
> I'll be happy to participate in this project during Google Summer of
> Code 2011 program and implement most of these ideas.

Cool - I know you've discussed a lot of this on IRC already, but feel
free to ask/discuss further.

And if you get a chance to translate any of your papers into English,
I'd be interested to read them.

Cheers,
    Olly



More information about the Xapian-devel mailing list