[Xapian-discuss] ------ stemming
James Aylett
james-xapian at tartarus.org
Mon Aug 7 16:19:32 BST 2006
On Mon, Aug 07, 2006 at 05:06:54PM +0200, Reini Urban wrote:
> Inspecting a real-life index gives me a lot of R strings of
> Rbla--------------------------------------------
> and a lot of ------------------- terms.
>
> Should not '----" and "====" or "****" better be stemmed to let's say 3
> chars?
> "---"
>
> For all languages of course.
R-terms are non-stemmed; they are the raw terms, in case you need them
for direct searching. If you are getting a lot of strange characters
in your raw terms, you might be better off trimming them out before
throwing the whole lot at scriptindex (since that is in practice
easier than doing this with omindex).
James
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list