[Xapian-devel] Test Dataset for performance and accuracy analysis

Parth Gupta pargup8 at gmail.com
Wed Mar 5 11:13:52 GMT 2014


Hi Aarsh,

Yes, its very important to test the implemented algorithms on the benchmark
collections. Most of the evaluation forums TREC, CLEF, INEX, FIRE, NTCIR
release corresponding datasets. The most suitable one for you would be an
ad-hoc collection which comprise of a document collection, topics
(query-set) and qrels (relevance judgements).

As these evaluation forums put a lot of effort (and money) in preparing
them, they are not easily and freely available. Mostly such datasets are
free for research if you are registered with them or you participate in
their tracks.

I see that INEX ad-hoc collection for 2009 and 2010 is available on
registering, so you can register with them, log in and download the dataset
along with queries and qrels for you. The link is:

https://inex.mmci.uni-saarland.de/

Use the adhoc collection, it was also used for testing Letor implementation
and BM25 in 2011 during GSoC (
http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme
).

Cheers,
Parth.


On Tue, Mar 4, 2014 at 4:46 PM, Aarsh Shah <aarshkshah1992 at gmail.com> wrote:

> Hi Parth,
>
>                                 I implemented DFR algorithms  in Xapian as
> a part of GSOC last year under the mentorship of Olly. This year, I want to
> work on analyzing and optimizing the performance of the DFR algorithms and
> comparing them with BM25.I also want to work on profiling the query
> expansion schemes and test the relevance(precision and recall) / speed(time
> taken) of the algorithms .
>                                  However, for this, I need a well defined
> data set containing a considerable amount of textual data, query logs
> containing queries that can be run on it, a set of relevant or expected
> documents which can be compared with the actual results to measure the
> relevance of the schemes. Please can you help me with this ? Thank you so
> much for your time.
>
> -Regards
> -Aarsh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140305/37723f58/attachment.html>


More information about the Xapian-devel mailing list