[Xapian-discuss] performance on document.get_data()

Tong Liu nemoliu at gmail.com
Wed Oct 23 06:30:51 BST 2013


I got some performance issue for document.get_data() and
enquire.get_mset(). It costs 35 seconds for matches =
enquire.get_mset(0,200), and 3 seconds for iterating all doc in matches to
get_data. Is't normal? My index contains 30millions documents. I use python
binding to operate xapian. Bellow it's my index structure
# value: 0:date, 1:site
# data: json message which contains: author, url, message(30 words)


Do you have any idea to improve the search performance , especially
doc.get_data?

my code snippet

database = xapian.Database("%s/athena" % DATA_PATH)
enquire = xapian.Enquire(database)
enquire.set_weighting_scheme(xapian.BM25Weight())
query = parse(keywords)
enquire.set_query(query)
matches = enquire.get_mset(start, 200)
matches.fetch()
result = [json.loads(match.document.get_data()) for match in matches]


More information about the Xapian-discuss mailing list