[Xapian-discuss] Very far out and static get_matches_estimated

Matthew Somerville matthew at mysociety.org
Thu Jun 11 00:29:06 BST 2009


Hi,

I'm getting quite odd results using get_matches_estimated() that I 
haven't seen before; we've just added a bunch of new data to the 
database. Xapian 1.0.7, checkatleast is set to 100.

The database will get new stuff added to it automatically around 8.30am 
BST, so it's possible this might affect the links I provide, I guess. 
But I'll note what is currently happening as I write.

http://www.theyworkforyou.com/search/?pop=1&s=statistics+19950101..19951231 
currently returns 1-20 of 14,678; page 18 gives 341-360 of 14,678:
http://www.theyworkforyou.com/search/?pop=1&s=statistics+19950101..19951231&p=18
But then page 19 gives 361-362 of 362, which is correct:
http://www.theyworkforyou.com/search/?s=statistics+19950101..19951231&p=19

So the estimate is wildly out for all pages until we get to the actual 
number of results. Changing the sort to relevance instead of reverse 
date gives a different far out number, but the effect is the same. 
Without the date range limiting, the initial estimate is 43,612, and 
this slowly changes as I up the page count until it gets to the correct 
result of 43,537 (good initial estimate!), as I'd expect.

It's also set by default to collapse per debate, but turning that off 
doesn't make any difference, it gives initially "1-20 of 30,249", up to 
"721-740 of 30,249" but then "741-746 of 746".

Any ideas?

ATB,
Matthew



More information about the Xapian-discuss mailing list