[Xapian-discuss] Search performance issues and
olly at survex.com
Tue Oct 23 23:08:38 BST 2007
On Tue, Oct 23, 2007 at 10:25:47PM +0100, Richard Boulton wrote:
> Ron Kass wrote:
> >* Estimates vary, although its exactly the same search done right one
> >after the other with no changes to the DB (no data added). This is not
> >really a big issue.
> This is the issue which looks oddest to me, however (though there are
> other oddities).
It would be interesting to work out why this is happening. I suspect
doing so will also reveal why the timings vary.
> If I understood you correctly, each search is
> performed on the same Database object, without reopening the Database
> object between each search. This should result in exactly the same
> results (and estimates) for each repeated search, since a Database
> object accesses a snapshot of the underlying database.
I have a theory. We sort containers of postlists when building the
postlist tree from the query. If we don't always use a stable sort,
then the exact tree built may vary between runs, which I think could
easily affect both the estimated number of matches and also how much
work the query requires (and so how long it takes).
If this is happening, we should see if using a stable sort for such
cases imposes any overhead. I suspect it wouldn't since this is a
one-off cost per query, and we're sorting a small number of items
(at most the number of terms in the query).
Another possibility is that there's some dependence on an undefined
value. The testsuite is clean under valgrind, but I'm sure there's
code which doesn't have testsuite coverage.
Ron: assuming this is on Linux, if you run a few of these varying cases
under valgrind, does it report any problems?
More information about the Xapian-discuss