GSoC aspirant - guruprasad hegde

Guruprasad Hegde guruhegde1308 at gmail.com
Fri Mar 23 07:05:16 GMT 2018


Hi,

I plan to propose 'Math Aware search' project.

After the literature review on the topic, I found Tangent or MIaS system
would be a good start. With that, I studied both of the systems well.

I plan to pick Tangent because it performs better. Also, it has a good
literature(thesis report and few papers available) and reference code
available.

I keep the summary of both the system, I welcome any opinion on the choice.

Tangent:
Indexing stage:
Each document contains math formula and text. Text indexing is done in a
usual way.

======preprocessing=================       ===indexing====
Math Formula(PresentationMathML) => Symbol Layout Tree => Generate Symbol
pair tuples => Store in Inverted Index
Searching stage:
Query(PresentationMathML) => symbol layout tree => Generate symbol pair
tuples => Form a query with logical OR operator=> Candidate documents
selection using dice coefficient metric => ReRanking the documents using
MSS metric.

MIaS:
Indexing stage:

 ======preprocessing===============      ============indexing=====
Math Formula(PresentationMathML) => Tokenization => Formula(token)
Modification =>  Index each token with proper weight(discussed in paper)
Formula modification = Ordering + Unification of variables + unification of
constant
Searching stage:
Query(PresentationMathML) => Formula modification => Form a query with
logical OR operator => Retrieve using text search engine

 I plan to send the draft proposal by the end of the day.

I also put some thoughts on implementation here.
I believe the major work is in preprocessing and searching stage(new weight
metric implementation). Existing indexing technique can be used for math
part as well.
My plan is to implement only formulae retrieval first(document has only
math) and add keyword support(document = text + math) later.
Later also add support for the query in latex format.

Please let me know if you have any comments or any questions on points I
mentioned.
Sorry for the delay. I would like to mention that I am doing active
preparation by reading Xapian codebase and literature.
Thank you.

Regards,
Guruprasad

Link for Tangent paper:
https://www.cs.rit.edu/~rlaz/files/ntcir2016_tangent.pdf
Link for MIaS paper:
https://link.springer.com/chapter/10.1007/978-3-642-22673-1_16
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20180323/bbb1722a/attachment.html>


More information about the Xapian-devel mailing list