[Xapian-discuss] ------ stemming

James Aylett james-xapian at tartarus.org
Mon Aug 7 16:19:32 BST 2006


On Mon, Aug 07, 2006 at 05:06:54PM +0200, Reini Urban wrote:

> Inspecting a real-life index gives me a lot of R strings of
> Rbla--------------------------------------------
> and a lot of ------------------- terms.
> 
> Should not '----" and "====" or "****" better be stemmed to let's say 3 
> chars?
> "---"
> 
> For all languages of course.

R-terms are non-stemmed; they are the raw terms, in case you need them
for direct searching. If you are getting a lot of strange characters
in your raw terms, you might be better off trimming them out before
throwing the whole lot at scriptindex (since that is in practice
easier than doing this with omindex).

James

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list