> Can someone tell me what was Gaurav Arora's exact contribution in the
> Clustering Search Results part during GSoC 2014? I guess that will be
> more helpful in understanding his code.

Karthik — Guarav is listed as mentor for that project, but as explained on our page with information for students & potential students, you shouldn’t read too much into that as it’s mostly a Google administrative thing.

I’d start in any case with the previous clustering branch (it’s called svn/clustering) in our git tree; you want to start with xapian-core/include/xapian/cluster.h and then the files in xapian-core/docsim (and then work from there out). A good understanding of how Xapian works will be important to understanding what is going on.

Once you’ve got up to speed with that, I’d look at the code George wrote during his 2014 project to see if that approach still makes sense. They’re independent implementations, but neither is finished and you may prefer to start again, learning from what they did, rather than building on either of them.


