[Xapian-discuss] TREC parser and comparison

Emmanuel Eckard emmanuel.eckard at epfl.ch
Thu Feb 1 19:35:14 GMT 2007


Hello,

   Some time ago I asked whether there was an indexer for TREC-format
bases, and outputs for TrecEval (yes I am doing my thesis). Today I
decided to spend a few hours toying with Xapian, and I came up with
something very crude.

   The programme was tested on the SMART collections (the ones you find
at ftp://ftp.cs.cornell.edu/pub/smart/ , converted to the TREC format),
with the default BM25 weight. The results were reasonably on par with
other tools like Lemur (the competition from
http://www.lemurproject.org/) and ad hoc tools, except for MED which
gave noise (there might be an indexing bug with this one), and TIME
which is exceptionally good. This might indicate that Xapian behaves
better with "easy" texts -- the other collections are more or less
difficult technical texts, TIME is a collection of news. Of course this
depends only on the weighting scheme.

   If this can be of some interest, the code is at the disposal of
whoever is brave enough to read it.

   Cheers !
     -- Emmanuel



More information about the Xapian-discuss mailing list