[Xapian-discuss] Search queries with wildcards

rm at fabula.de rm at fabula.de
Wed Dec 15 10:04:42 GMT 2004


On Wed, Dec 15, 2004 at 08:01:56AM +0100, Timo Haberkern wrote:
> A wild card search would be very great. In germany we have a lot of 
> compound words. A pure stemmer base search didn't find a lot of matches. 
> Think of the word "Fehlercode", if i use "Fehler" as a search query i 
> wouldn't find the documents with Fehlercode in it, right? But i need 
> such a solution. And wildcards seems to be the only solution.
> 
> How can the wildcard search be done? Do you have to develop something 
> for that?

Ah, so you indeed want to abuse wildcard search for proper indexing ;-)
The proper way to do it: have your stemmer do all the hard work.
If both "Fehler" and "Fehlercode" stem to the same stem there's no real
problem (as long as this is not the only term in a query, but then, single
word queries are rather bad for statistical IR ...). Unfortunately this
does introduce some sematic problems: a "Fehlercode" (error code) isn't
a "Fehler" (error) but a specific "Code".

Another posibility would be to have the stemmer emit several component 
terms ("Fehler" "Code") - as tempting as this might first seem (it _does_
look more correct than the first solution) it bears similar semantic problems
as the first solution. The "true" stem would be just "Code". 

The Right Thing to do here is to introduce multiple ranked stems. Unfortunately
there's no free/open source stemmer for your language of choice :-/
A working stemmer for german needs do do some context analysis, a lot of
morphological knowledge and a good (!) dictionary. Iff you need this for
a commercial product i could point you in the right direction (no, i'm not
affiliated with these sources :-)

 HTH Ralf Mattes

> regards
> 
> Timo
> 
> >Cheers,
> >   Olly
> >
> >
> > 
> >
> 
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss



More information about the Xapian-discuss mailing list