[Xapian-discuss] performance on document.get_data()
Olly Betts
olly at survex.com
Wed Oct 30 00:02:08 GMT 2013
On Wed, Oct 23, 2013 at 01:30:51PM +0800, Tong Liu wrote:
> I got some performance issue for document.get_data() and
> enquire.get_mset(). It costs 35 seconds for matches =
> enquire.get_mset(0,200), and 3 seconds for iterating all doc in matches to
> get_data. Is't normal? My index contains 30millions documents. I use python
> binding to operate xapian. Bellow it's my index structure
> # value: 0:date, 1:site
> # data: json message which contains: author, url, message(30 words)
That sounds much slower than I'd expect. Is that the cold cache time?
If so, does rerunning the same query take much less time?
> Do you have any idea to improve the search performance , especially
> doc.get_data?
>
> my code snippet
>
> database = xapian.Database("%s/athena" % DATA_PATH)
> enquire = xapian.Enquire(database)
> enquire.set_weighting_scheme(xapian.BM25Weight())
> query = parse(keywords)
What are you passing in for keywords here?
> enquire.set_query(query)
> matches = enquire.get_mset(start, 200)
Is start 0 here?
> matches.fetch()
With a local database, it probably won't help to call fetch().
> result = [json.loads(match.document.get_data()) for match in matches]
So your time includes parsing the JSON - try changing that to this to
focus on the time actually taken by Xapian and its python bindings:
result = [match.document.get_data() for match in matches]
Also, what Xapian version are you using?
Cheers,
Olly
More information about the Xapian-discuss
mailing list