capitialization vs stemming: missing quest results
James Aylett
james at tartarus.org
Tue Apr 4 21:08:35 BST 2017
On 4 Apr 2017, at 12:54, Peter Marquardt <marquardt_p at molgen.mpg.de> wrote:
> TLDR: echo "krankenkassen" > i/input.txt; omindex i/ --stemmer=german; quest "krankenkassen" -> not found. see https://gist.github.com/anonymous/609f82a065f3d0ac6b1d077073be286f for full script & output
>
> I'm a complete xapian noob, so what am I doing wrong ?
You aren't setting the stemmer in your quest invocation. The following works for me:
$ mkdir i
$ echo krankenkassen > i/input.txt
$ omindex --stemmer=german --db=testdb i/
$ quest --db=testdb --stemmer=german krankenkassen
Output of quest:
Parsed Query: Query(Zkrankenkass at 1)
MSet:
1: [0.154151]
url=/input.txt
sample=krankenkassen
type=text/plain
modtime=1491335879
size=14
The 'Z' at the start of the term is omega's marker that this is a stemmed term. (This is a common thing to do in Xapian, and is supported by the TermGenerator and QueryParser as a pair — the first for indexing, the second for searching.)
The problem is that if you don't specify German stemming, the default is English, which produces the wrong stemmed term:
$ quest --db=testdb krankenkassen
Parsed Query: Query(Zkrankenkassen at 1)
MSet:
You can turn stemming off in quest and it'll work as well:
$ quest --db=testdb --stemmer=none krankenkassen
Parsed Query: Query(krankenkassen at 1)
MSet:
1: [0.154151]
url=/input.txt
sample=krankenkassen
type=text/plain
modtime=1491335879
size=14
J
--
James Aylett
devfort.com — spacelog.org — tartarus.org/james/
More information about the Xapian-devel
mailing list