[Xapian-tickets] [Xapian] #507: Some little problems with the french stemmer
Xapian
nobody at xapian.org
Tue Sep 21 15:38:28 BST 2010
#507: Some little problems with the french stemmer
-------------------------+--------------------------------------------------
Reporter: versmisse | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Library API | Version: 1.2.3
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Changes (by olly):
* version: => 1.2.3
* component: Other => Library API
Comment:
I think what you're describing is a feature rather than a bug.
The stems which are produced aren't necessarily actual words, but rather
tokens which look rather like the words associated with that stem.
For example, in English ''early'' stems to ''earli'' which isn't a real
word. But this doesn't matter, as what is important is that ''earlier''
also stems to ''earli''.
Section 5 of http://snowball.tartarus.org/texts/introduction.html
discusses this:
> A question arises: if the user never sees the stemmed form, does its
appearance matter? The answer must be no, although the Porter stemmer
tries to make the unstemmed forms guessable from the stemmed forms. For
example, from appropri you can guess appropriate. At least, trying to
achieve this effect acts as a useful control. Similarly with the other
stemmers presented here, an attempt has been made to keep the appearance
of the stemmed forms as familiar as possible.
If the stemmer is producing the same stem for words which should have
different stems (or different stems for cases which should be the same)
then it would be more efficient to report this directly to the Snowball
developers. Snowball is the project which maintains these algorithms -
see http://snowball.tartarus.org/
There's test data for the stemmers in SVN under browser:trunk/xapian-data
--
Ticket URL: <http://trac.xapian.org/ticket/507#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list