[Xapian-tickets] [Xapian] #225: Spelling algorithm should consider frequency and not just edit-distance

Xapian nobody at xapian.org
Sun Aug 1 11:37:48 BST 2010


#225: Spelling algorithm should consider frequency and not just edit-distance
-------------------------+--------------------------------------------------
 Reporter:  philipn      |        Owner:  olly     
     Type:  defect       |       Status:  assigned 
 Priority:  high         |    Milestone:  1.2.x    
Component:  Library API  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by olly):

 It seem trac messes up the UTF-8 characters when previewing that rst file
 - view the rst file itself for a better version.

 Thinking about it, p is going to vary by user, but we're probably talking
 something like 0.01 to 0.001.  It might well be the actual best value
 isn't the same as the true probability (since we make various simplifying
 assumptions) so perhaps it is best to tune p for best results rather than
 try too hard to determine a "true" value.

 To efficiently implement this model, it would be useful to track an upper
 bound on the spelling frequency, which is easy to do, but we don't
 currently, and seems like it will need an incompatible database format
 change.

 But it's easy to address the specific point about not returning any
 correction if the word is in the spelling dictionary (as it may be if
 misspelled in the indexed documents) - I've addressed that in trunk
 r14859.

-- 
Ticket URL: <http://trac.xapian.org/ticket/225#comment:10>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list