[Xapian-devel] GSOC : Language Modelling for information retrieval with Diversified Search results

Gaurav Arora gauravarora.daiict at gmail.com
Thu Mar 22 01:10:46 GMT 2012


Hello,

 I am a undergraduate student at DA-IICT,India pursuing Btech in
 Information and Communication Technology.Major field of my Research is
Information Retrieval and Natural Language processing. xapain being an
powerful Information retrieval library have attracted me towards
implementing  stuff learned in class for this project.I have worked on
entity search on RDF data,SMS based FAQ retrieval,Question Answering under
competitions in evaluation forums like CLEF ,FIRE.I want to grab
GSOC opportunity and join world  of FOSS developers.

I would like to work and include hooping techniques like Language Modelling
and Diversified Search in information retrieval.

Brief Summary of idea:

Language Modelling for Information retrieval approach  focus on building
probabilistic language models for documents and rank document based on
probability of model generating the query.Technique is heavy and costlier
than the traditional information retrieval technique but has proved to
preform better in literature than traditional methods.

Language modelling approach performs better as it tries to capture word and
phrase association to capture user context.

Diversified search is key ways for user satisfaction in absence of explicit
knowledge of user intent.Diversified search algorithm tries to find
out(estimate) different possible context of user query and tries
to pull  potential document of all context rather than explicitly assuming
a context.
Diversification can be done by generating different rank list for different
context or adding document from different context in a single rank list.

Resources:

http://nlp.stanford.edu/IR-book/html/htmledition/ponte-and-crofts-experiments-1.html
http://dl.acm.org/citation.cfm?id=291008
http://goo.gl/klqYy
http://dl.acm.org/citation.cfm?id=1860709


I have compiled and installed xapian and tried playing with xapian in past
few days.I have few queries regarding xapian :-

1. xapain supports relevance feedback(query expansion) through "
Xapian::Enquire::get_eset" function.which algorithm is used to expand query
in Enquire class.

Since search result diversification is its naive form performed by
expanding query with different context and adding document from different
context in final rank-list, thereby catering to all context of query.

I was thinking if i can use the algorithm implemented in expanded set for
query expansion and implement a new algorithm in Search diversification in
this way query expansion feature of xapian will also get powerful.

2. I have read that xapian supports passage retrieval ,proximity based
query ,wildcard query and passage retrieval but I could not find any
documentation or function providing these facilities of xapain.I will be
glad if you can point me towards any available documentation describing to
use such options.


I would be glad if mentors from xapian community can comment on my idea of
implementing Language modelling technique and search result diversification
as a project in scenario of Open Source Search Engine Library( xapian).
Will implementing these techniques help xapian as a open source project?

wishing to join xapian community.


-- 
with regards
Gaurav A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120322/3049aa0c/attachment.htm>


More information about the Xapian-devel mailing list