[Xapian-devel] GSOC 2012 : QueryParser Reimplementation

Sehaj Singh Kalra sehaj.sk at gmail.com
Sun Mar 25 16:57:19 BST 2012


Understood your point about caching of results. Will work on the
suggestions you gave on how to improve the parser functionality.
As you mentioned "it's probably better to cache after the QueryParser and
key the cache on the serialised form of the final Query object plus any
parameters you set on Enquire."
I have a question - Does Xapian at present cache the result at any level ?
If not, then I can add the functionality of caching ("after" the Query has
been parsed as you rightfully explained that it's probably better to cache
after the QueryParser) in my proposal.
Currently I am going through the test cases in queryparsertest.cc and
figuring out the different ways in which the parsed form of those queries
 can be improved.
Will let you know if I face any doubts in quryparsertest.cc.

Cheers,
Sehaj

On Fri, Mar 23, 2012 at 8:10 AM, Olly Betts <olly at survex.com> wrote:

> On Thu, Mar 22, 2012 at 07:32:23PM +0530, Sehaj Singh Kalra wrote:
> > Maintaining logs will improve parser as the present query can be matched
> > against the recent queries. This way, suppose for example, if we find the
> > exact query, the time taken by search engine
> > can be reduced.
>
> Caching of results is certainly useful, but is the QueryParser the right
> place to do it?  In many cases, the query consists of more than just the
> string the user types in - there are probably filters, options like
> sorting and collapsing, and so on.  It's hard to handle these if you
> try to do the caching in the QueryParser, because it knows nothing about
> them.  You could pass all this data in, but having to pass in lots of
> data is a warning sign that you've got the module boundaries wrong.
>
> If you cache results at the application level, you can key the cache off
> the parameters you feed to the search (for a web search, you could just
> key off the query part of the URL, though you probably want to at
> least normalise it).  Another benefit of caching here is you can cache
> the rendered results (HTML for a web search, JSON or XML for a web API,
> etc).
>
> If you cache inside Xapian, then it's probably better to cache after the
> QueryParser and key the cache on the serialised form of the final Query
> object plus any parameters you set on Enquire.
>
> > Also even if the exact query can't be found,  this will
> > help parser in making sane and better Query object trees by matching
> > against some logs and using algorithms like longest common sub-sequence
> > etc.
>
> How would this help the parser do this?  It's easy to assert that "X
> will help Y", but we're looking supporting evidence in a proposal.
>
> > This way query can be modified a  bit to make more sense from the free
> > form text.
>
> Again, how would this work?
>
> > These were the plans suggested to improve parser functioning.
> > Please guide me, about the other ways in which the parser can be modified
> > for better outputs.
>
> There are a lot of testcases in queryparsertest.cc, some artificial and
> some examples of real world queries.  The parsed forms of quite a few of
> the real world queries could be improved upon.  Some have comments
> noting this, but not all do.
>
> Currently some parse errors trigger a fall-back mode which turns off
> various flags and reparses the query.  Overall this is beneficial, but
> it can result in sometimes surprising parses for some queries.  Really
> it is papering over the real issue.
>
> We have the "spelling suggestion" feature, which allows us to return
> a parsed query, but suggest what the user might have meant.  It would
> be cool to reuse this mechanism for cases where the query seems
> malformed and there are two (or more) reasonable options.
>
> Cheers,
>    Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120325/04a90740/attachment.htm>


More information about the Xapian-devel mailing list