[Xapian-discuss] Summing collapsed values

Richard Boulton richard at lemurconsulting.com
Wed Mar 25 18:36:30 GMT 2009


On Wed, Mar 25, 2009 at 06:26:10PM +0000, Simon Roe wrote:
> Hi,
> 
> Is there a way to sum a value of collapsed documents?  So, if I
> collapse on value 0 ('year') I want to be able to get the sum of value
> 1 ('amount') for all the collapsed documents.

To check I'm understanding you right:  suppose you had 3 documents in the
database:

doc1: year=2000, amount=5
doc2: year=2000, amount=10
doc3: year=2001, amount=20

You do a search which matches all 3, but collapse on year and get, say:

doc1: year=2000, amount=5
doc3: year=2001, amount=20

And you want, for each of those documents to get the sum of the "amount"
for all the items which were collapsed?  ie, "15" for year=2000, "20" for
year=2001.

There's no way to do this directly.  It's tricky even to get a precise
count of the number of documents collapsed for each year (because xapian's
matcher may have been able to stop matching early, and not even seen some
documents which would have been collapsed away).

Best I can see with core xapian is to run the search once to get the list
of years which have any entries, and then to run the search again for each
of those years with an additional restriction, asking for all the results,
and then adding up the amounts outside xapian.  You'd probably want to add
a term to each document holding the year, so you can restrict to a
particular year efficiently.

Alternatively, you could write a matchspy which accesses the values and
adds them up as it goes.  There isn't a "built-in" matchspy that you could
use for this (either on trunk or on any of the branches, as far as I can
see).  If you take this approach, you'd need to set the checkatleast
parameter (to get_mset()) to the database size, to ensure that no documents
get missed.

-- 
Richard



More information about the Xapian-discuss mailing list