[Xapian-devel] Case sensitive search

Olly Betts olly at survex.com
Thu Nov 24 14:44:14 GMT 2005


On Thu, Nov 24, 2005 at 02:26:12PM +0000, James Aylett wrote:
> On Thu, Nov 24, 2005 at 05:44:16AM -0800, arjan holscher wrote:
> 
> > I´ve been developing a search application using Xapian and
> > Omega. When our visitors search on specific keywords it´s
> > noticable that Omega is case sensitive. It will find results on
> > keyword `Asus´ and it will NOT find results on keyword
> > ´asus´.
> 
> The way omega works is for all words to be normalised to lower case,
> but also to add a 'raw' term for every word that starts with an upper
> case letter. Raw terms are prefixed with 'R'.

That's right, and searching for a capitalised word searches for the raw
form by default, which is the behaviour Arjan is describing.

You can tell Omega to ignore the R terms and always use the stemmed form
by adding "$set{stem_all,true}" to the top of your omegascript query
template (or templates if you're using more than one).

If you want to eliminate the R terms from the database, you'll have to
delete the code in indextext.cc where add_term or add_posting is called
with "rprefix + term".  That code is used by both omindex and
scriptindex.

> > NOTE: I'm passing in the documents as they are in the
> > database. Meaning that the documents contain words with capital
> > letters. I don't know if this is causing the problem? Do I need to
> > make the input to scriptindex lowercase?

Actually, you could lowercase in the scriptindex script instead of
changing indextext.cc.  Just add "lower" before "index" (or
"indexnopos").  You probably want the "lower" after any "field" though,
unless you want to value in the field also in lowercase (you probably
don't want to lowercase the document title or sample used in the result
display...)

The raw term mechanism is intended to allow searching for propernames
which are sometimes conflated with common words by stemming.  But it
isn't perfect.  This is something which stemming at search time would
help.

Cheers,
    Olly




More information about the Xapian-devel mailing list