[Xapian-discuss] about index speed of xapian
superthread
superthread at 126.com
Wed Nov 21 09:46:26 GMT 2012
hi,
i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second.
code:
try
{
Xapian::WritableDatabase database("testindex", Xapian::DB_CREATE_OR_OPEN);
mybase::Timeval now;
std::string line;
while (getline(fin, line))
{
int pos = line.find('|');
if (pos != std::string::npos)
{
std::string imsi = line.substr(0, pos);
std::string msisdn = line.substr(pos + 1);
Xapian::Document doc;
doc.add_term(imsi);
doc.add_term(msisdn);
database.add_document(doc);
}
}
database.close();
std::cout << now.elapsed() << std::endl;
}
catch (const Xapian::Error& error)
{
std::cout << error.get_msg() << std::endl;
}
the following is the index result:
total 1.9G
-rw-rw-r-- 1 warren warren 0 11-21 17:07 flintlock
-rw-rw-r-- 1 warren warren 28 11-21 17:07 iamchert
-rw-rw-r-- 1 warren warren 22K 11-21 17:24 postlist.baseA
-rw-rw-r-- 1 warren warren 20K 11-21 17:22 postlist.baseB
-rw-rw-r-- 1 warren warren 1.4G 11-21 17:24 postlist.DB
-rw-rw-r-- 1 warren warren 2.0K 11-21 17:24 record.baseA
-rw-rw-r-- 1 warren warren 1.8K 11-21 17:22 record.baseB
-rw-rw-r-- 1 warren warren 121M 11-21 17:24 record.DB
-rw-rw-r-- 1 warren warren 6.7K 11-21 17:24 termlist.baseA
-rw-rw-r-- 1 warren warren 6.1K 11-21 17:22 termlist.baseB
-rw-rw-r-- 1 warren warren 428M 11-21 17:24 termlist.DB
too big!
is there any problem about my code, and is there any way to impove index speed?
thank you
More information about the Xapian-discuss
mailing list