[Xapian-discuss] Summing collapsed values

Simon Roe simon.roe at talusdesign.co.uk
Wed Mar 25 18:43:18 GMT 2009


On Wed, Mar 25, 2009 at 6:36 PM, Richard Boulton
<richard at lemurconsulting.com> wrote:
> To check I'm understanding you right:  suppose you had 3 documents in the
> database:
>
> doc1: year=2000, amount=5
> doc2: year=2000, amount=10
> doc3: year=2001, amount=20
>
> You do a search which matches all 3, but collapse on year and get, say:
>
> doc1: year=2000, amount=5
> doc3: year=2001, amount=20
>
> And you want, for each of those documents to get the sum of the "amount"
> for all the items which were collapsed?  ie, "15" for year=2000, "20" for
> year=2001.

Yes, that's correct, sorry for not being clear.

> There's no way to do this directly.  It's tricky even to get a precise
> count of the number of documents collapsed for each year (because xapian's
> matcher may have been able to stop matching early, and not even seen some
> documents which would have been collapsed away).
>
> Best I can see with core xapian is to run the search once to get the list
> of years which have any entries, and then to run the search again for each
> of those years with an additional restriction, asking for all the results,
> and then adding up the amounts outside xapian.  You'd probably want to add
> a term to each document holding the year, so you can restrict to a
> particular year efficiently.
>
> Alternatively, you could write a matchspy which accesses the values and
> adds them up as it goes.  There isn't a "built-in" matchspy that you could
> use for this (either on trunk or on any of the branches, as far as I can
> see).  If you take this approach, you'd need to set the checkatleast
> parameter (to get_mset()) to the database size, to ensure that no documents
> get missed.


Ok, thanks, I'll look in to both options.  Do you have any idea which
search would be faster?  Obviously it depends on database size and
various other things, but is one mostly 'better' than the other?

-- 
Help save the economy:
http://seriouschange.org.uk/

E: simon.roe at talusdesign.co.uk
M: 07742079314



More information about the Xapian-discuss mailing list