[Xapian-devel] Interested in GSOC projects

Olly Betts olly at survex.com
Thu Mar 31 04:08:41 BST 2011


On Thu, Mar 31, 2011 at 01:30:30AM +0530, saurabh kumar wrote:
> I have some doubts :
> 
> 1) Why is using the tools like yacc, bison not a good approach? Can you
> illustrate with an example?

The parser needs to be forgiving, since the input is typed by (often
non-technical) humans.  The input isn't expected to be program code, and
"Syntax Error" is rarely an acceptable response (better to correct the
query and say "Searched for 'XXX' instead", with a "Did you mean 'YYY'?"
is there's an alternative plausible fix up).

Good error recovery in generated parsers is hard to do well, and usually
results in adding extra rules to the parser description, and that
obfuscates what we're actually trying to do.

The grammar is also not something we can always restrain in ways to suit
the parser generator.

For a formally specified grammar (like a language standard perhaps),
there's usually a BNF description of the grammar rules, so it's handy
to have the parser description mirror it.  That's not the case here.

Currently the lexer does things like tracking the "mode", which is
really an indication of where in the grammar we are.

> 2) In the proposed project are we NOT going to use any tools like YACC etc.?

Well, you're welcome to propose what you like, but you'll need to do
a harder sell on this one.

If you want to use a parser generator, we currently use lemon, which has
a clearer syntax than bison/yacc, and is structured such that the lexer
calls the parser (rather than the parser calling the lexer, as in
bison/yacc).  That allows the lexer to be simpler, since it doesn't need
to "keep its place" with explicit state.  So I'd suggest we probably
don't want to move back to using bison (one reason we moved away
originally was the lack of reentrancy in bison-generated parser, but
that at least now seems to have been addressed).

> Should I mail my proposal to the mailing list or just submit it at google
> SOC site? Because certainly I would require your
> comments to improve upon the first draft.

Just submit it to the site - we can comment there and you can revise it
up until the deadline (April 8th, 19:00 UTC).

Cheers,
    Olly



More information about the Xapian-devel mailing list