[Xapian-discuss] Stemming and Query Parsing

Olly Betts olly at survex.com
Mon Oct 18 02:26:37 BST 2004


On Sun, Oct 17, 2004 at 01:40:28PM -0400, Mike Boone wrote:
> We're currently running Xapian 0.8.1 via PHP. I am trying to search on the
> term 'C#' in our keyword list. If I run the stemmer (English) independently,
> 'c#' is stemmed to 'c#', but it appears that when I parse the term using the
> QueryParser, it is truncated to plain 'c'. For a similar search, 'C++' stems
> properly in both the stemmer and the QueryParser.
> 
> Is there a list of which characters are thrown out by the QueryParser, and
> is there any way to use the QueryParser, yet keep the desired characters?

Are you indexing "c#" as a term?  Our indexers (omindex and scriptindex)
currently don't (which ought to be fixed next time we make indexing
changes), and the QueryParser is set up in line with this - there's no
point it generating search terms not in the index.

If you do have "c#" as a term, you'll have to modify the queryparser
source for now as this isn't currently configurable.  Look for the call
to C_isnotsign - currently this keeps trailing + and - in the term (e.g.
C++, Cl-, Mg2+).  You also want to allow "#" here.

Cheers,
    Olly



More information about the Xapian-discuss mailing list