[Xapian-devel] GSOC 2012 : QueryParser Reimplementation

Sehaj Singh Kalra sehaj.sk at gmail.com
Tue Mar 20 13:49:34 GMT 2012


Hello, I am Sehaj Singh Kalra, an Indian student. I am an undergraduation
student in Indian Institute of Technology-Delhi (IIT-Delhi) pursuing
Computer Science and Engineering. I want to work on the idea "QueryParser
Reimplementation ".
With the background I have in this field, I am fully comfortable with this
project.

I have went through the specification and through Query Parser
documentation (which I believe is not complete), and I am currently going
through the source code of current parser implementation.
I have some doubts :
1. How is Multiple Language Support handled in Xapian? While going through
the source code I found that the parser invokes the term generator class to
convert query to terms.  Accordingly it would depend on what stage other
processes like stemming are being done.
2. The main motivation for parser re-implementation and not using Flex &
bison or lemon generator according to me is to make error state recovery
fast since in natural languages, mistakes are bound to happen as well as
NLP(Natural Language Processing) is different from processing of computer
language.  If there is any other aspect related with it, please guide me.
3. To what extent does the xapian queryparser at present take part in
optimising the search?

Based on the understanding till now, I would also like to extent the
project by proposing some more things :
1. Pre-Analysing the query and making efficient changes at parser level
using some algorithms so as to make the search more efficient.
2. Aid in Relevancy Ranking
3. To maintain a log of queries searched and processing and ranking them
using algorithms . Using of these logs will make the parser more efficient.
The things proposed above will lead to some pre-search filtering which is
best done at Parser level. Moreover since the parser would be hand-written
rather than generated, integrating these things will make the parser more
efficient.

I will be happy to participate in this project during Google Summer of Code
2012 to implement these ideas.

Cheers,
Sehaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120320/bd0d5d8b/attachment.htm>


More information about the Xapian-devel mailing list