[Xapian-discuss] Re: Spelling based on frequency and not just distance

Philip Neustrom philipn at gmail.com
Tue Jan 15 12:57:18 GMT 2008


The patch attached to this email is better than the previous.  Hopefully
somebody can come up with something better entirely, as I'm not totally
happy with what I have -- it tends to suggest things like "plant" for
"plants" and then "plan" for "plant" :)

--Philip

On Jan 15, 2008 1:24 AM, Philip Neustrom < philipn at gmail.com> wrote:

> Hey all,
>
> After implementing the new spelling functionality on http://wikispot.org I
> noticed that terms like "wikipeda" weren't yielding spelling suggestions.
> Taking a quick look at the code, it looks like if we find an exact match,
> even if it has a frequency less than another match within the provided
> delta, we don't suggest anything.  This is probably fine for sites with
> documents where you can be assured the data is properly spelled -- but not
> suitable for something like a wiki or the web in general.
>
> I did something simple, attached in a patch.  Maybe someone has a better
> idea of how to weigh the different options, but my quick fix seemed to give
> much better results than the "give up on exact or edit-distance-closest
> match" code that was there already.
>
> --Philip Neustrom
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spelling_frequency.diff
Type: text/x-diff
Size: 2638 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20080115/a09ce691/spelling_frequency.bin


More information about the Xapian-discuss mailing list