[Xapian-discuss] Search queries with wildcards
Timo Haberkern
thaberkern at emedia-office.de
Wed Dec 15 10:21:57 GMT 2004
Hello,
rm at fabula.de wrote:
>On Wed, Dec 15, 2004 at 08:01:56AM +0100, Timo Haberkern wrote:
>
>
>>A wild card search would be very great. In germany we have a lot of
>>compound words. A pure stemmer base search didn't find a lot of matches.
>>Think of the word "Fehlercode", if i use "Fehler" as a search query i
>>wouldn't find the documents with Fehlercode in it, right? But i need
>>such a solution. And wildcards seems to be the only solution.
>>
>>How can the wildcard search be done? Do you have to develop something
>>for that?
>>
>>
>
>Ah, so you indeed want to abuse wildcard search for proper indexing ;-)
>
>
Ahmm, no i only want to have the possibility that the user of the search
can search for word fragments :-) So i don't care for matches that
haven't the correct semantic context (as you mentioned below). Maybe
another example can bring some light in what i want:
There are Article-Nr. in the documents i want to index. For example
A1590-789
A1590-555
A6719-9911
Where the first 5 characters are an article-group identifier. The user
should be capable to search for all documents with articles of an
arcticle group. Therefore he should be able to use for exmpample the
search query: "A1590*" or "A1590-*"
But: I don't want to search only for article numbers, the search fro
fragments should be possible for simple word fragments too (as described
in my last mail)
Thats what i want. Is there a way to do this in xapian?
Timo
>The proper way to do it: have your stemmer do all the hard work.
>If both "Fehler" and "Fehlercode" stem to the same stem there's no real
>problem (as long as this is not the only term in a query, but then, single
>word queries are rather bad for statistical IR ...). Unfortunately this
>does introduce some sematic problems: a "Fehlercode" (error code) isn't
>a "Fehler" (error) but a specific "Code".
>
>Another posibility would be to have the stemmer emit several component
>terms ("Fehler" "Code") - as tempting as this might first seem (it _does_
>look more correct than the first solution) it bears similar semantic problems
>as the first solution. The "true" stem would be just "Code".
>
>The Right Thing to do here is to introduce multiple ranked stems. Unfortunately
>there's no free/open source stemmer for your language of choice :-/
>A working stemmer for german needs do do some context analysis, a lot of
>morphological knowledge and a good (!) dictionary. Iff you need this for
>a commercial product i could point you in the right direction (no, i'm not
>affiliated with these sources :-)
>
> HTH Ralf Mattes
>
>
>
>>regards
>>
>>Timo
>>
>>
>>
>>>Cheers,
>>> Olly
>>>
>>>
>>>
>>>
>>>
>>>
>>_______________________________________________
>>Xapian-discuss mailing list
>>Xapian-discuss at lists.xapian.org
>>http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
>>
>
>
>
>
More information about the Xapian-discuss
mailing list