[Xapian-devel] stemtest failing with romanian

Olly Betts olly at survex.com
Thu Mar 29 17:42:00 BST 2007


On Thu, Mar 29, 2007 at 05:18:11PM +0100, Richard Boulton wrote:
> Now, when I run make check, stemtest fails with the romanian stemmer on 
> the word "acela??I".  This should stem to "acel" according to the output 
> file generated by the stemwords utility from snowball, but xapian stems 
> it to "acela??i".  The only change I can see to the stemming algorithms 
> since then is a change to the snowball code generator made on wednesday 
> morning by Olly, but reverting this change doesn't seem to fix the 
> problem.

I did run stemtest before committing that change, but it looks like
xapian-data wasn't updated then so it didn't test romanian.  I do now
see the same problem as you.

However, I don't think the code generator change is to blame.  It looks
to me like romanian test data just isn't in step with that from snowball
for some reason:

snowball:
-rw-r--r-- 1 olly olly 171550 2007-03-27 13:59 output.txt
-rw-r--r-- 1 olly olly 202066 2007-03-27 13:59 voc.txt

xapian-data:
-rw-r--r-- 1 olly olly 533334 2007-03-27 14:46 romanian.st
-rw-r--r-- 1 olly olly 690463 2007-03-27 14:46 romanian.voc

Cheers,
    Olly



More information about the Xapian-devel mailing list