[Xapian-devel] stemtest failing with romanian
Olly Betts
olly at survex.com
Thu Mar 29 17:42:00 BST 2007
On Thu, Mar 29, 2007 at 05:18:11PM +0100, Richard Boulton wrote:
> Now, when I run make check, stemtest fails with the romanian stemmer on
> the word "acela??I". This should stem to "acel" according to the output
> file generated by the stemwords utility from snowball, but xapian stems
> it to "acela??i". The only change I can see to the stemming algorithms
> since then is a change to the snowball code generator made on wednesday
> morning by Olly, but reverting this change doesn't seem to fix the
> problem.
I did run stemtest before committing that change, but it looks like
xapian-data wasn't updated then so it didn't test romanian. I do now
see the same problem as you.
However, I don't think the code generator change is to blame. It looks
to me like romanian test data just isn't in step with that from snowball
for some reason:
snowball:
-rw-r--r-- 1 olly olly 171550 2007-03-27 13:59 output.txt
-rw-r--r-- 1 olly olly 202066 2007-03-27 13:59 voc.txt
xapian-data:
-rw-r--r-- 1 olly olly 533334 2007-03-27 14:46 romanian.st
-rw-r--r-- 1 olly olly 690463 2007-03-27 14:46 romanian.voc
Cheers,
Olly
More information about the Xapian-devel
mailing list