[Xapian-discuss] Xapian index size 475GB = 170 million documents (URLs)

Felix Antonius Wilhelm Ostmann ostmann at websuche.de
Mon Dec 20 10:31:24 GMT 2010


Can you give us more? I like so see info about cpu/ram/hdd setup and
query-time/avg/max/ and query-count/parallel/total and all other you can
give :)

Am 19.12.2010 00:58, schrieb Kevin Duraj:
> Xapians,
> 
> I am maintaining about two indexes for my search engines which
> approximately is each the same size. I would like to share this
> knowledge with you, since many of you have never seen Xapian index of
> this size. And of course you can search the index by yourself at
> 
> - http://myhealthcare.com/
> - http://find1friend.com/
> 
> I need 2 x 100 million more documents into each index, and I hope it
> will fit on one hard disk of 2TB, and I will soon beat single handedly
> the largest Xapian BrightStation's Webtop search engine implementation
> (archive.org snapshot), which offered a sub-second search over around
> 500 million web pages (around 1.5 terabytes of database files).
> Reference: http://xapian.org/history
> 
> One sample index size:
> 
> total 475G
> -rw-r--r-- 1 kevin kevin   28 2010-12-18 15:25 iamchert
> -rw-r--r-- 1 kevin kevin   13 2010-12-18 12:19 position.baseA
> -rw-r--r-- 1 kevin kevin 3.8M 2010-12-18 15:25 position.baseB
> -rw-r--r-- 1 kevin kevin 240G 2010-12-18 15:25 position.DB
> -rw-r--r-- 1 kevin kevin   13 2010-12-18 04:31 postlist.baseA
> -rw-r--r-- 1 kevin kevin 923K 2010-12-18 11:36 postlist.baseB
> -rw-r--r-- 1 kevin kevin  58G 2010-12-18 11:36 postlist.DB
> -rw-r--r-- 1 kevin kevin   13 2010-12-18 11:36 record.baseA
> -rw-r--r-- 1 kevin kevin 1.6M 2010-12-18 12:03 record.baseB
> -rw-r--r-- 1 kevin kevin 102G 2010-12-18 12:02 record.DB
> -rw-r--r-- 1 kevin kevin   13 2010-12-18 12:03 termlist.baseA
> -rw-r--r-- 1 kevin kevin 1.2M 2010-12-18 12:19 termlist.baseB
> -rw-r--r-- 1 kevin kevin  76G 2010-12-18 12:18 termlist.DB
> 
> $ delve .
> number of documents = 169346678
> average document length = 230970
> document length lower bound = 1
> document length upper bound = 3585385
> highest document id ever used = 169346678
> 
> Kevin Duraj
> http://pacificair.com/
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
> 


-- 
Mit freundlichen Grüßen

Felix Antonius Wilhelm Ostmann
-----------------------------------------------------------
Websuche Search Technology GmbH & Co. KG
Martinistraße 3, D-49080 Osnabrück, Germany
-----------------------------------------------------------
Tel.: +49 541 40666-0, Fax: +49 541 40666-22
Email: info at websuche.de, Web: www.websuche.de
-----------------------------------------------------------
AG Osnabrück - HRA 200252, Ust-IdNr.: DE814737310
-----------------------------------------------------------
Komplementärin: Websuche Search Technology Verwaltungs GmbH
AG Osnabrück - HRB 200359, Geschäftsführer: Ansas Meyer
-----------------------------------------------------------



More information about the Xapian-discuss mailing list