[Xapian-discuss] performance on document.get_data()

Tong Liu nemoliu at gmail.com
Wed Oct 30 00:58:20 GMT 2013


1 That sounds much slower than I'd expect.  Is that the cold cache time?
If so, does rerunning the same query take much less time?

Yes, It's cold cache time. The time of next query with same words is faster.

2  > query = parse(keywords)
What are you passing in for keywords here?

such as: BMW, iPhone5s...

3  > matches = enquire.get_mset(start, 200)
Is start 0 here?

Yes, start=0

4  result = [match.document.get_data() for match in matches]
So your time includes parsing the JSON - try changing that to this to
focus on the time actually taken by Xapian and its python bindings

I have tried to run without dcode json, but there is no difference.

Xapian version is 1.2.15.


2013/10/30 Olly Betts <olly at survex.com>

> On Wed, Oct 23, 2013 at 01:30:51PM +0800, Tong Liu wrote:
> > I got some performance issue for document.get_data() and
> > enquire.get_mset(). It costs 35 seconds for matches =
> > enquire.get_mset(0,200), and 3 seconds for iterating all doc in matches
> to
> > get_data. Is't normal? My index contains 30millions documents. I use
> python
> > binding to operate xapian. Bellow it's my index structure
> > # value: 0:date, 1:site
> > # data: json message which contains: author, url, message(30 words)
>
> That sounds much slower than I'd expect.  Is that the cold cache time?
> If so, does rerunning the same query take much less time?
>
> > Do you have any idea to improve the search performance , especially
> > doc.get_data?
> >
> > my code snippet
> >
> > database = xapian.Database("%s/athena" % DATA_PATH)
> > enquire = xapian.Enquire(database)
> > enquire.set_weighting_scheme(xapian.BM25Weight())
> > query = parse(keywords)
>
> What are you passing in for keywords here?
>
> > enquire.set_query(query)
> > matches = enquire.get_mset(start, 200)
>
> Is start 0 here?
>
> > matches.fetch()
>
> With a local database, it probably won't help to call fetch().
>
> > result = [json.loads(match.document.get_data()) for match in matches]
>
> So your time includes parsing the JSON - try changing that to this to
> focus on the time actually taken by Xapian and its python bindings:
>
>   result = [match.document.get_data() for match in matches]
>
> Also, what Xapian version are you using?
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list