[Xapian-discuss] Search queries with wildcards

Timo Haberkern thaberkern at emedia-office.de
Wed Dec 15 10:06:45 GMT 2004


James Aylett wrote:

>On Wed, Dec 15, 2004 at 08:01:56AM +0100, Timo Haberkern wrote:
>
>  
>
>>A wild card search would be very great. In germany we have a lot of 
>>compound words. A pure stemmer base search didn't find a lot of matches. 
>>Think of the word "Fehlercode", if i use "Fehler" as a search query i 
>>wouldn't find the documents with Fehlercode in it, right? But i need 
>>such a solution. And wildcards seems to be the only solution.
>>    
>>
>
>A thought: this is perhaps impractical because of dictionary sourcing
>issues (and management, too, come to think of it), but you could look
>for compound prefixes while turning longer words into terms, and split
>(and stem) them based on possible compound constructions. So on
>indexing "Fehlercode" you first detect that "Fehler" is an acceptable
>fragment within a compound word, and store s("Fehler"), s("code"),
>s("Fehlercode") (where s() stems).
>  
>
You are right with that, but the problem is how to detect the acceptabe 
fragments of a compound word. The application has to index technical 
documents and there are many, many, many (...) compound words that never 
occure in any dictionary. At the moment i don't see a practical way to 
solve this problem as you described. Or do you have an idea to do so?

>This would reduce your index size over more flexible wildcards, where
>I think you'd have to store all possible substrings you want to search
>for. If you wanted to be able to search for "code" and have it match
>"Fehlercode", you'd end up generating lost of substrings (erm ... n^2
>- (n-1)! or something?). Although perhaps easier to index; also, it
>may make constructing effective searches easier (because you just use
>the words from the query straight off).
>  
>
Don't be sure if i understood that right. Is the only possible way to 
implement wildcards that i have to store all possible substrings in the 
index database?? So if i have the word "car" i need to store in the 
database:

- "c"
- "ca"
- "car"

for doing a simple "c*" wildcardsearch?

Isn't there the possibility to extend the search-module for doing a 
wildcard-search over the index-database?

Timo

>J
>
>  
>



More information about the Xapian-discuss mailing list