[Xapian-discuss] MatchSpy speed for value counts?

Andrei T andrei360-lists at yahoo.com
Mon Sep 17 22:21:52 BST 2007


I am currently using revision 9300 of Xapian core. I
heard rumors of version 1.0.3 having optimized
features to count occurances of values across a match
set.

My particular problem is to know that in, say, 10,000
matching documents, the *exact" number of values
containing the string "Location:Utah". 

I tried to pass my custom spy object as the 6th
argument of get_mset(). However, it appears to be
taking just as long if it's the 5th argument
(MatchDecider). I *am* setting checkatleast to 1
million, to make sure I get the exact counts.

My impression was that the new MatchSpy will look at
all matching documents, to get the exact counts, and
checkatleast won't even be needed, except as a hard
cut-off.

Am I missing something?   

As a side note. This has been done for a while in
commercial engines like Autonomy or Endeca, with
surprising efficiency. We are currently doing this
with MySQL, but this requirement is demanding, and it
does not scale well. I spent weeks optimizing it. My
hope is that a "real" search engine will do this better.





More information about the Xapian-discuss mailing list