[Xapian-discuss] Xapian index size 475GB = 170 million documents (URLs)
Kevin Duraj
kevin.softdev at gmail.com
Sat Dec 18 23:58:14 GMT 2010
Xapians,
I am maintaining about two indexes for my search engines which
approximately is each the same size. I would like to share this
knowledge with you, since many of you have never seen Xapian index of
this size. And of course you can search the index by yourself at
- http://myhealthcare.com/
- http://find1friend.com/
I need 2 x 100 million more documents into each index, and I hope it
will fit on one hard disk of 2TB, and I will soon beat single handedly
the largest Xapian BrightStation's Webtop search engine implementation
(archive.org snapshot), which offered a sub-second search over around
500 million web pages (around 1.5 terabytes of database files).
Reference: http://xapian.org/history
One sample index size:
total 475G
-rw-r--r-- 1 kevin kevin 28 2010-12-18 15:25 iamchert
-rw-r--r-- 1 kevin kevin 13 2010-12-18 12:19 position.baseA
-rw-r--r-- 1 kevin kevin 3.8M 2010-12-18 15:25 position.baseB
-rw-r--r-- 1 kevin kevin 240G 2010-12-18 15:25 position.DB
-rw-r--r-- 1 kevin kevin 13 2010-12-18 04:31 postlist.baseA
-rw-r--r-- 1 kevin kevin 923K 2010-12-18 11:36 postlist.baseB
-rw-r--r-- 1 kevin kevin 58G 2010-12-18 11:36 postlist.DB
-rw-r--r-- 1 kevin kevin 13 2010-12-18 11:36 record.baseA
-rw-r--r-- 1 kevin kevin 1.6M 2010-12-18 12:03 record.baseB
-rw-r--r-- 1 kevin kevin 102G 2010-12-18 12:02 record.DB
-rw-r--r-- 1 kevin kevin 13 2010-12-18 12:03 termlist.baseA
-rw-r--r-- 1 kevin kevin 1.2M 2010-12-18 12:19 termlist.baseB
-rw-r--r-- 1 kevin kevin 76G 2010-12-18 12:18 termlist.DB
$ delve .
number of documents = 169346678
average document length = 230970
document length lower bound = 1
document length upper bound = 3585385
highest document id ever used = 169346678
Kevin Duraj
http://pacificair.com/
More information about the Xapian-discuss
mailing list