<div dir="ltr">Hi Parth,<br><br> I implemented DFR algorithms in Xapian as a part of GSOC last year under the mentorship of Olly. This year, I want to work on analyzing and optimizing the performance of the DFR algorithms and comparing them with BM25.I also want to work on profiling the query expansion schemes and test the relevance(precision and recall) / speed(time taken) of the algorithms .<br>
However, for this, I need a well defined data set containing a considerable amount of textual data, query logs containing queries that can be run on it, a set of relevant or expected documents which can be compared with the actual results to measure the relevance of the schemes. Please can you help me with this ? Thank you so much for your time.<br>
<br>-Regards<br>-Aarsh</div>