[Xapian-discuss] UTF8 support plans (without stemming)

Craig Macdonald craigm at dcs.gla.ac.uk
Thu Apr 28 13:51:39 BST 2005


>
>
>
>Well, these two querstions relate to each other: Xapian is strong in
>'probabilistic IR' and that approach kind of needs some sort of stemming.
>  
>
I dont totally agree with that. We've had some success in applying only 
the first two steps of the English (Porter) stemmer
to large English web corpuses. Many submissions to last year's TREC 
Terabyte track didnt use stemming at all.
    http://www.google.co.uk/search?q=2004+trec+terabyte+stemming
It would also appear to be a similar approach to what Google is doing. 
The first two steps only drops plurals and tense suffixes.


Craig Macdonald
craigm{at}dcs.gla.ac.uk



More information about the Xapian-discuss mailing list