[Xapian-discuss] understanding stemming and synonyms

ostmann at websuche.de ostmann at websuche.de
Fri Sep 23 16:36:59 BST 2011


I am working with version 1.2.7 and want to use stemming and synonyms.  
I use the perl-bindings and get some problems.

First of all: the perl-bindings dont allow the QueryParser a third  
argument when calling parse_query! So i cannot set a default prefix  
(which perhaps is the solution to my problem, but later more)


i have a simple testcase:
3 documents, every document only has one word:
bike
fahrrad (german bike singular)
fahrraeder (german bike plural, umlaut replaced)

i have build the database with one synonym: Zfahrrad = Zbik

When i insert the documents, i printed the termlist:

INSERT DOKUMENT: bike
DOCUMENT: Document(Xapian::Document::Internal(data=`bike', terms[2]))
TERM: Zbik
TERM: bike
INSERT DOKUMENT: fahrrad
DOCUMENT: Document(Xapian::Document::Internal(data=`fahrrad', terms[2]))
TERM: Zfahrrad
TERM: fahrrad
INSERT DOKUMENT: fahrraeder
DOCUMENT: Document(Xapian::Document::Internal(data=`fahrraeder', terms[2]))
TERM: Zfahrrad
TERM: fahrraeder

That looks fine, but when i now use the query_parser with stemmer  
(german2 & STEM_ALL) and parse_query (FLAG_AUTO_SYNONYMS), i get this  
queries:

ENTER QUERY: bike
[QUERY: Xapian::Query(bik:(pos=1))]
[RESULTS: 0]

ENTER QUERY: fahrrad
[QUERY: Xapian::Query((fahrrad:(pos=1) SYNONYM Zbik:(pos=1)))]
[RESULTS: 2]

ENTER QUERY: fahrraeder
[QUERY: Xapian::Query((fahrrad:(pos=1) SYNONYM Zbik:(pos=1)))]
[RESULTS: 2]

I think there is a Z missing befor the first item, he searching for  
the stemmed word of bike (it is bik/Zbik), but he dont prefix that  
question. No search every find bike and fahrraeder ...

After fighting this, i want to implement spelling too, but my first  
tests with auto spelling correction (feeding spelling while indexing)  
was really bad, perhaps its good to only add a complete dictionary  
into the database and dont use the index self?





More information about the Xapian-discuss mailing list