[Xapian-tickets] [Xapian] #225: Spelling algorithm should consider frequency and not just edit-distance

Xapian nobody at xapian.org
Tue Dec 5 05:12:30 GMT 2023


#225: Spelling algorithm should consider frequency and not just edit-distance
-----------------------------+-------------------------------
 Reporter:  Philip Neustrom  |             Owner:  Olly Betts
     Type:  defect           |            Status:  assigned
 Priority:  normal           |         Milestone:  2.0.0
Component:  Library API      |           Version:  git master
 Severity:  normal           |        Resolution:
 Keywords:                   |        Blocked By:
 Blocking:                   |  Operating System:  All
-----------------------------+-------------------------------
Changes (by Olly Betts):

 * milestone:  1.5.0 => 2.0.0


Old description:

> As described here:
> http://thread.gmane.org/gmane.comp.search.xapian.general/5740/focus=5743
>
> If the spelling correction algorithm considered frequency and edit-
> distance
> (using  some reasonable heuristic) we would see dramatically better
> results.
> ~~The current spelling algorithm will only correct words that never
> appear in the
> spelling index.~~ ''(Since 1.2.3, it will offer a correction for a word
> when the correction has a higher frequency than the word)''

New description:

 As described here:
 http://thread.gmane.org/gmane.comp.search.xapian.general/5740/focus=5743

 If the spelling correction algorithm considered frequency and edit-
 distance
 (using  some reasonable heuristic) we would see ~~dramatically~~ better
 results.
 ~~The current spelling algorithm will only correct words that never appear
 in the
 spelling index.~~ ''(Since 1.2.3, it will offer a correction for a word
 when the correction has a higher frequency than the word)''

--
Comment:

 It'd be really good to address the remainder of this, but it now doesn't
 require a database format change so postponing.
-- 
Ticket URL: <https://trac.xapian.org/ticket/225#comment:17>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list