[Xapian-devel] Starting work on Perf Test Module

Aarsh Shah aarshkshah1992 at gmail.com
Wed May 14 05:38:45 BST 2014


I am beginning work on the perf test module. The initial steps that I aim
to accomplish are :-

-> Download the wikipedia dumps for multiple languages .
-> Write python scripts to tokenize the dump (will probably use something
like nltk which has powerful inbuilt tokenizers)
-> Discuss and finalize the design of the search and query expansion perf
tests as I want to complete them before working on the indexing perf test.

-> If anyone has an experience with dowbloading wikipedia dumps, please can
I get some advice on how to go about doing it and which is the best place
to get them ?
-> For the search and query expansion perf test, I need a query log based
on the test documents I'll be using (Inex data set, as per the recent
discussion with Olly on IRC.).
Please can I get some advice on how to go about using the Inex data sets
and the corresponding query logs.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140514/8fbc146d/attachment.html>

More information about the Xapian-devel mailing list