[Xapian-discuss] different stemming

Simon Roe simon.roe at talusdesign.co.uk
Sat May 9 07:05:38 BST 2009


On Sat, May 9, 2009 at 12:12 AM, james cauwelier
<james.cauwelier at gmail.com> wrote:
> mmmhh,  the problem is that the site itself is in one language.  But it 's
> perfectly possible that somebody belgian wants to buy a spanish book and so
> this person will search with spanish keywords.

1) Is this possible in the UI?  A submit button for each language (if
there are only a few)?  Not nice, I know.

2) Might there be a way of adding the language to the document (as a
value or prefixed term), performing the search, getting the most
frequent language (for the first 10, say)  and re-searching with that
language's stemming?

3) You could do this in the UI too -- remove the stemming for any
search that doesn't have a country selected, then add links on the
results page to country specific stemmed searches.

4) Get the term count for each stemmed version of the search before searching?


None of these are great ideas.  3 would be the fastest and simplest I guess.



> 2009/5/8 James Aylett <james-xapian at tartarus.org>
>
>> On Fri, May 08, 2009 at 11:43:16AM +0200, james cauwelier wrote:
>>
>> > The site I am working on has products in different languages (dutch,
>> > english, french, italian, spanish).  I want to search these products, but
>> > while indexing I should use the correct stemmer.  No problem, because I
>> know
>> > the language of a product description.
>> >
>> > But when somebody queries the database I have no information about the
>> > language.  Thus, I am not able to select the correct stemmer for queries.
>> > How should I solve this?  Skip stemming altogether?  That 's what I am
>> doing
>> > now.
>>
>> I know this isn't the most helpful answer, but "it depends". You could
>> disable stemming, but this may have unhelpful effects on the quality
>> of your results. That's almost certainly the simplest thing to do,
>> though.
>>
>> If you can figure out what language they care about most, you can stem
>> to that language and restrict the search to documents (products) that
>> were in that language in the first place. You may be able to ponder
>> this from the same source as you're choosing site localisation.
>>
>> J
>>
>> --
>>  James Aylett
>>
>>  talktorex.co.uk - xapian.org - uncertaintydivision.org
>>
>> _______________________________________________
>> Xapian-discuss mailing list
>> Xapian-discuss at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



-- 
Help save the economy:
http://seriouschange.org.uk/

E: simon.roe at talusdesign.co.uk
M: 07742079314



More information about the Xapian-discuss mailing list