[Xapian-discuss] Tryout paches for faster chert search: http://trac.xapian.org/ticket/326

Chris chris at s-4-u.net
Thu Sep 8 11:46:01 BST 2011


On 09/08/2011 11:51 AM, Richard Boulton wrote:
> Sources of realistic
> query data are harder to come across - anyone got any good ideas for
> that?  
>

Reminds me about the AOL fuckup a few years ago (they released the
search queries of 650.000 users, by mistake).
Mirror: http://www.gregsadetsky.com/aol-data/

Combined with Wikipedia, Stackoverflow and product-data of a few hundred
online shops (affili.net et al) could(?) provide a nice and diversed
dataset.

On the other side, the database should probably be in-memory, to not be
limited by disk io, which gives a 40GB index if just using the online
shop product data.

Greets, Chris



More information about the Xapian-discuss mailing list