<div dir="ltr">Hello,<div><br></div><div>I wanted to decide the dataset that should be used for Letor stabilisation project. </div><div><br></div><div>I think <a href="http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/software/inex/">2009 INEX Wikipedia Collection</a> should work fine. It's a collection of 2,666,190 XML articles, <a href="http://inex.mmci.uni-saarland.de/protected/adhoc/2009-topics.zip">115 topics</a>, <a href="http://inex.mmci.uni-saarland.de/protected/adhoc/2009-inex_eval.zip">50,275 qrel</a> labels and has an uncompressed size of 50.75 gb (5.52 GB compressed).</div><div><br></div><div>Another similar alternative is <a href="http://inex-lod.mpi-inf.mpg.de/2013/">2013 INEX Wikipedia LOD Collection</a>. It's a collection of 12,216,109 XML articles, <a href="http://inex.mmci.uni-saarland.de/protected/dc/2013-ld-adhoc-topics.xml">144 topics</a>, <a href="http://inex.mmci.uni-saarland.de/protected/dc/2013-ld-adhoc-qrels.zip">14,400 qrel</a> labels. It has a compressed size of 11.12 GB. INEX 2009 Collection is a subset of it. </div><div><br></div><div>If there are any recent/better datasets that can be used, please let me know.</div><div><br></div><div>Thanks,</div><div>Ayush</div>
<div><div>
</div></div></div>