capitialization vs stemming: missing quest results

James Aylett james at tartarus.org
Tue Apr 4 21:08:35 BST 2017


On 4 Apr 2017, at 12:54, Peter Marquardt <marquardt_p at molgen.mpg.de> wrote:

> TLDR: echo "krankenkassen" > i/input.txt; omindex i/ --stemmer=german; quest "krankenkassen" -> not found. see https://gist.github.com/anonymous/609f82a065f3d0ac6b1d077073be286f for full script & output
> 
> I'm a complete xapian noob, so what am I doing wrong ?

You aren't setting the stemmer in your quest invocation. The following works for me:

$ mkdir i
$ echo krankenkassen > i/input.txt
$ omindex --stemmer=german --db=testdb i/ 
$ quest --db=testdb --stemmer=german krankenkassen

Output of quest:

Parsed Query: Query(Zkrankenkass at 1)
MSet:
1: [0.154151]
url=/input.txt
sample=krankenkassen 
type=text/plain
modtime=1491335879
size=14

The 'Z' at the start of the term is omega's marker that this is a stemmed term. (This is a common thing to do in Xapian, and is supported by the TermGenerator and QueryParser as a pair — the first for indexing, the second for searching.)

The problem is that if you don't specify German stemming, the default is English, which produces the wrong stemmed term:

$ quest --db=testdb krankenkassen
Parsed Query: Query(Zkrankenkassen at 1)
MSet:

You can turn stemming off in quest and it'll work as well:

$ quest --db=testdb --stemmer=none krankenkassen
Parsed Query: Query(krankenkassen at 1)
MSet:
1: [0.154151]
url=/input.txt
sample=krankenkassen 
type=text/plain
modtime=1491335879
size=14

J

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/




More information about the Xapian-devel mailing list