[Xapian-discuss] Stemming and Query Parsing

Olly Betts olly at survex.com
Tue Oct 19 14:37:43 BST 2004


On Tue, Oct 19, 2004 at 09:27:33AM -0400, Mike Boone wrote:
> OK, I've fooled around with making some changes in queryparser.cc in the
> yylex2() function to get it to keep my # character, but it's not working, so
> I guess I don't yet understand the code well enough.
> 
> I copied the block for the + character and modified it:
> 
> case '#':
>   // Ignore # at end of query
>   if (qptr == q.end()) return 0;
>   if (isspace(*qptr) || *qptr == '#') {
>     /* Ignore ## or # followed by a space */
>     /* Note that nethack## and Cl# are handled above */
>     ++qptr;
>     return yylex();
>   }
>   /* '#' is NOT used in the grammar rules, but leaving code here as-is */
>   return c;
> 
> This code block isn't quite what I want since # is not really a grammar
> rule, and I don't want it to be.

That's the wrong block - that takes care of "+" being used in front of a
term to mark it as always required.  You want the code "above" which the
comment refers to.  It's probably a call to find() with a predicate of
something like p_notplusminus in 0.8.3.

> I'm also not sure if I should add the # sign to the yytname array...it looks
> to me like those are only for grammar rules. I haven't tried that yet.

I doubt it - it's the lexing stage where this needs to be done - you
want "C#" to be a single token in the grammar.

> (BTW, I'm doing this now with Xapian 0.8.3. The 0.8.1 xapian.so for PHP was
> 1.8MB, the same file for 0.8.3 is 10MB! This is on Red Hat Enterprise AS
> 2.1.)

We now build and link the library different.  But I suspect the
difference is debug information.  What size is xapian.so if you install
it with "make install-strip"?

Cheers,
    Olly



More information about the Xapian-discuss mailing list