[Xapian-discuss] Stemming and Query Parsing

Mike Boone mike at boonedocks.net
Wed Oct 20 16:37:40 BST 2004


OK, I have the # sign stuff working. When the comment said "Note that
nethack-- and Cl- are handled above", I thought it meant the 'if' statement
a couple lines above, not way above in another function. It might be more
clear to change it to "Note that nethack-- and Cl- are handled above in
p_notplusminus".

I changed the function to look like this:

inline static bool p_notplusminus(unsigned int c)
{
  // MB adding # sign
  return c != '+' && c != '-' && c != '#';
}

Regarding the large PHP4 xapian.so file size: I was just copying the PHP4
xapian.so from the .libs directory to the place I wanted to keep it, but I
ran the make install-strip and it cut it down to 3.7MB, still double the
size of the 0.8.1 version, but better than 10MB. Is there any way to make
that strip happen without doing the make install-strip and having the files
put somewhere I didn't necessarily want them?

Thanks for the help!
Mike.

-----Original Message-----
From: Olly Betts [mailto:olly at survex.com]
Sent: Tuesday, October 19, 2004 8:38 AM
To: Mike Boone
Cc: xapian-discuss at lists.xapian.org
Subject: Re: [Xapian-discuss] Stemming and Query Parsing


On Tue, Oct 19, 2004 at 09:27:33AM -0400, Mike Boone wrote:
> OK, I've fooled around with making some changes in queryparser.cc in the
> yylex2() function to get it to keep my # character, but it's not working,
so
> I guess I don't yet understand the code well enough.
>
> I copied the block for the + character and modified it:
>
> case '#':
>   // Ignore # at end of query
>   if (qptr == q.end()) return 0;
>   if (isspace(*qptr) || *qptr == '#') {
>     /* Ignore ## or # followed by a space */
>     /* Note that nethack## and Cl# are handled above */
>     ++qptr;
>     return yylex();
>   }
>   /* '#' is NOT used in the grammar rules, but leaving code here as-is */
>   return c;
>
> This code block isn't quite what I want since # is not really a grammar
> rule, and I don't want it to be.

That's the wrong block - that takes care of "+" being used in front of a
term to mark it as always required.  You want the code "above" which the
comment refers to.  It's probably a call to find() with a predicate of
something like p_notplusminus in 0.8.3.

> I'm also not sure if I should add the # sign to the yytname array...it
looks
> to me like those are only for grammar rules. I haven't tried that yet.

I doubt it - it's the lexing stage where this needs to be done - you
want "C#" to be a single token in the grammar.

> (BTW, I'm doing this now with Xapian 0.8.3. The 0.8.1 xapian.so for PHP
was
> 1.8MB, the same file for 0.8.3 is 10MB! This is on Red Hat Enterprise AS
> 2.1.)

We now build and link the library different.  But I suspect the
difference is debug information.  What size is xapian.so if you install
it with "make install-strip"?

Cheers,
    Olly




More information about the Xapian-discuss mailing list