[Xapian-tickets] [Xapian] #507: Some little problems with the french stemmer

Xapian nobody at xapian.org
Tue Sep 21 15:38:28 BST 2010

#507: Some little problems with the french stemmer
 Reporter:  versmisse    |       Owner:  olly 
     Type:  enhancement  |      Status:  new  
 Priority:  normal       |   Milestone:       
Component:  Library API  |     Version:  1.2.3
 Severity:  normal       |    Keywords:       
Blockedby:               |    Platform:  All  
 Blocking:               |  
Changes (by olly):

  * version:  => 1.2.3
  * component:  Other => Library API


 I think what you're describing is a feature rather than a bug.

 The stems which are produced aren't necessarily actual words, but rather
 tokens which look rather like the words associated with that stem.

 For example, in English ''early'' stems to ''earli'' which isn't a real
 word.  But this doesn't matter, as what is important is that ''earlier''
 also stems to ''earli''.

 Section 5 of http://snowball.tartarus.org/texts/introduction.html
 discusses this:

 > A question arises: if the user never sees the stemmed form, does its
 appearance matter? The answer must be no, although the Porter stemmer
 tries to make the unstemmed forms guessable from the stemmed forms. For
 example, from appropri you can guess appropriate. At least, trying to
 achieve this effect acts as a useful control. Similarly with the other
 stemmers presented here, an attempt has been made to keep the appearance
 of the stemmed forms as familiar as possible.

 If the stemmer is producing the same stem for words which should have
 different stems (or different stems for cases which should be the same)
 then it would be more efficient to report this directly to the Snowball
 developers.  Snowball is the project which maintains these algorithms -
 see http://snowball.tartarus.org/

 There's test data for the stemmers in SVN under browser:trunk/xapian-data

Ticket URL: <http://trac.xapian.org/ticket/507#comment:1>
Xapian <http://xapian.org/>

More information about the Xapian-tickets mailing list