GSoC aspirant - guruprasad hegde

Guruprasad Hegde guruhegde1308 at gmail.com
Mon Mar 26 15:05:03 BST 2018


Please find the draft proposal with this link:
https://github.com/guruhegde/xapian-gsoc-proposal
It is still work in progress.

Question: If we index math terms(symbol pair tuples) in the same DB along
with the text data, do you think, adding field prefix(making a new one)
implicitly for math terms, help in some way w.r.t performance for cases
like searching only text terms or only math terms?

Regards,
Guruprasad


On Mon, Mar 26, 2018 at 3:27 PM, Gaurav Arora <gauravarora.daiict at gmail.com>
wrote:

>
> I thought if I start with the MathML as input and build the core, then I
>> can extend the system to support any other query/document type by looking
>> for third party tools available for c++. At the moment, I don't have any
>> idea about this.  What do you think?
>>
>> We can look for the option in bonding period too. For now, I can make
>> latex to mathml as first step in proposal and shuffle the steps later right?
>>
>
> Proposal need to account for doing that. i.e proposal should account that
> before end of GSOC search through latex should be supported and merged. It
> can be done anytime. It's perfectly fine to build the core using MathML
> representation initially.
>
>>
>> Generating symbol layout tree requires implementing parser. I guess it
>> invloves good amount of text processing. Since it's standard problem, I
>> hope it should not be hard, but requires handling many scenarios. I plan to
>> read about the parser and try implementing small examples first in coming
>> days.
>>
> That would be great :)
>
>>
>> I feel generating symbol pair will be easy once I build the tree.
>>
>> Do you think I should come up with some sort of psuedocode in proposal?
>>
> Would definitely help.
>
>>
>>
> With other weight metric implementations available and with existing
>> indexing structure, I feel getting the stats and implementing this would
>> not be hard I feel.
>>
>> A basic check and estimate would help to estimate time this would take to
> plan the project timeline accordingly.
>
>
>> I have been working on the draft. I am really sorry about the delay in
>> draft. Hope to make up for that with some good work:)
>>
> Sooner you show us the draft version would increase your chance of getting
> feedback from us and improving your proposal.
>
>
> - Gaurav Arora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20180326/dc297f9a/attachment.html>


More information about the Xapian-devel mailing list