[Xapian-devel] GSOC 2012 : QueryParser Reimplementation

Olly Betts olly at survex.com
Fri Mar 23 02:40:21 GMT 2012


On Thu, Mar 22, 2012 at 07:32:23PM +0530, Sehaj Singh Kalra wrote:
> Maintaining logs will improve parser as the present query can be matched
> against the recent queries. This way, suppose for example, if we find the
> exact query, the time taken by search engine
> can be reduced.

Caching of results is certainly useful, but is the QueryParser the right
place to do it?  In many cases, the query consists of more than just the
string the user types in - there are probably filters, options like
sorting and collapsing, and so on.  It's hard to handle these if you
try to do the caching in the QueryParser, because it knows nothing about
them.  You could pass all this data in, but having to pass in lots of
data is a warning sign that you've got the module boundaries wrong.

If you cache results at the application level, you can key the cache off
the parameters you feed to the search (for a web search, you could just
key off the query part of the URL, though you probably want to at
least normalise it).  Another benefit of caching here is you can cache
the rendered results (HTML for a web search, JSON or XML for a web API,
etc).

If you cache inside Xapian, then it's probably better to cache after the
QueryParser and key the cache on the serialised form of the final Query
object plus any parameters you set on Enquire.

> Also even if the exact query can't be found,  this will
> help parser in making sane and better Query object trees by matching
> against some logs and using algorithms like longest common sub-sequence
> etc.

How would this help the parser do this?  It's easy to assert that "X
will help Y", but we're looking supporting evidence in a proposal.

> This way query can be modified a  bit to make more sense from the free
> form text.

Again, how would this work?

> These were the plans suggested to improve parser functioning.
> Please guide me, about the other ways in which the parser can be modified
> for better outputs.

There are a lot of testcases in queryparsertest.cc, some artificial and
some examples of real world queries.  The parsed forms of quite a few of
the real world queries could be improved upon.  Some have comments
noting this, but not all do.

Currently some parse errors trigger a fall-back mode which turns off
various flags and reparses the query.  Overall this is beneficial, but
it can result in sometimes surprising parses for some queries.  Really
it is papering over the real issue.

We have the "spelling suggestion" feature, which allows us to return
a parsed query, but suggest what the user might have meant.  It would
be cool to reuse this mechanism for cases where the query seems
malformed and there are two (or more) reasonable options.

Cheers,
    Olly



More information about the Xapian-devel mailing list