[Xapian-devel] Starting work on Perf Test Module
Aarsh Shah
aarshkshah1992 at gmail.com
Wed May 14 05:38:45 BST 2014
Hello,
I am beginning work on the perf test module. The initial steps that I aim
to accomplish are :-
-> Download the wikipedia dumps for multiple languages .
-> Write python scripts to tokenize the dump (will probably use something
like nltk which has powerful inbuilt tokenizers)
-> Discuss and finalize the design of the search and query expansion perf
tests as I want to complete them before working on the indexing perf test.
*Questions*
-> If anyone has an experience with dowbloading wikipedia dumps, please can
I get some advice on how to go about doing it and which is the best place
to get them ?
-> For the search and query expansion perf test, I need a query log based
on the test documents I'll be using (Inex data set, as per the recent
discussion with Olly on IRC.).
Please can I get some advice on how to go about using the Inex data sets
and the corresponding query logs.
Regards
Aarsh
Regards
Aarsh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140514/8fbc146d/attachment.html>
More information about the Xapian-devel
mailing list