[Xapian-discuss] Stemming and Query Parsing
Olly Betts
olly at survex.com
Tue Oct 19 14:37:43 BST 2004
On Tue, Oct 19, 2004 at 09:27:33AM -0400, Mike Boone wrote:
> OK, I've fooled around with making some changes in queryparser.cc in the
> yylex2() function to get it to keep my # character, but it's not working, so
> I guess I don't yet understand the code well enough.
>
> I copied the block for the + character and modified it:
>
> case '#':
> // Ignore # at end of query
> if (qptr == q.end()) return 0;
> if (isspace(*qptr) || *qptr == '#') {
> /* Ignore ## or # followed by a space */
> /* Note that nethack## and Cl# are handled above */
> ++qptr;
> return yylex();
> }
> /* '#' is NOT used in the grammar rules, but leaving code here as-is */
> return c;
>
> This code block isn't quite what I want since # is not really a grammar
> rule, and I don't want it to be.
That's the wrong block - that takes care of "+" being used in front of a
term to mark it as always required. You want the code "above" which the
comment refers to. It's probably a call to find() with a predicate of
something like p_notplusminus in 0.8.3.
> I'm also not sure if I should add the # sign to the yytname array...it looks
> to me like those are only for grammar rules. I haven't tried that yet.
I doubt it - it's the lexing stage where this needs to be done - you
want "C#" to be a single token in the grammar.
> (BTW, I'm doing this now with Xapian 0.8.3. The 0.8.1 xapian.so for PHP was
> 1.8MB, the same file for 0.8.3 is 10MB! This is on Red Hat Enterprise AS
> 2.1.)
We now build and link the library different. But I suspect the
difference is debug information. What size is xapian.so if you install
it with "make install-strip"?
Cheers,
Olly
More information about the Xapian-discuss
mailing list