Query regarding Xapian-Matcher Optimisation Project, GSOC-2016

James Aylett james-xapian at tartarus.org
Thu Mar 10 11:54:01 GMT 2016

On Thu, Mar 10, 2016 at 04:24:23PM +0530, Sanat Jain wrote:

> I beg to state that I was looking for a project In GSOC 2016 which matches
> my skill and passion, and I came across Matcher Optimisation, Xapian. This
> project has really captured my interest and I would love to contribute to
> this project.

Hi, Sanat -- welcome to Xapian!

> 1) There are 5 tickets on
> https://trac.xapian.org/wiki/GSoCProjectIdeas#Project:MatcherOptimisations
> And 11 tickets on
> https://trac.xapian.org/query?status=assigned&status=new&status=reopened&component=Matcher&group=component&col=id&col=summary&col=component&col=status&col=type&col=priority&col=milestone&report=16&order=priority

> So how many tickets are there and what is the difference between assigned
> and new ticket (as mentioned in the 2nd link)?

'assigned' generally means that someone has done some work on it. For
instance in the case of the settling pond ticket, Olly's written
various patches over the years; the last comment on the ticket
suggests that the most pressing work is to create some new benchmarks
with the various options against the latest code, to see if benefits
are still possible using any of the earlier approaches.

Not all of the tickets you've listed are optimisation issues (for
instance more OP_VALUE_ comparison operators is mostly not about
optimisation), but there are some that aren't listed on the project
itself. However as the project says:

> The idea for this project is to take several such optimisation ideas
> (either from the above, or ones you develop yourself)

So you can take any of the ones you think are likely to help, or any
other ideas you have while investigating the matcher.

> And how many tickets should I pick in my project?

I can't give you an exact number; you need to come up with a project
that will take up the 12 weeks of GSoC. A good way of doing this is to
have 'stretch goals', ie more work than you think you'll achieve, so
that if you do better than expected you still have work to do. The key
here is going to be putting the optimisations you'll tackle in a
sensible order, so the ones with the combined biggest impact and
chance of working you can do first (and get them merged into Xapian
for a future release), with more speculative ones or ones that help in
fewer cases, or won't make as much of an impact for other reasons,
left until later in the project.

> 2) What are the pre-requisites that I should fulfil before I can get into
> learning about these tickets (because it is difficult for me to understand
> a few things in them)?

You need to understand how the matcher works. This is a hard part of
the system, and there probably isn't as much documentation for it as
there should be. There are some notes
<https://xapian.org/docs/matcherdesign.html>, and beyond that some of
the code has comments around the core work it's doing, which should
help you start to map out how the different pieces fit together.

> 3) It will take me a little time to understand the Xapian’s code and
> optimisation techniques in depth, so may I do research in depth in the time
> between 25th of March and 22nd of April and submit my proposal in a brief
> manner?

We strongly recommend that you get a draft proposal together as early
as possible, so we can give you feedback on it. Generally speaking,
the most successful projects have been ones where that happens rather
than someone submitting a proposal that we haven't seen just before
the deadline.

Obviously you'll be getting familiar with how the matcher works at the
same time, so your early drafts won't contain as much detail. However
a lot of the process of putting together the timeline for a proposal
is about managing the risk of things you don't know, so that shouldn't
be a huge problem, and hopefully across the couple of weeks of the
application period we can give you some feedback that will help you
both understand the matcher and its optimisations, and to understand
the risks of each potential optimisation.

> P.S. Sorry for sending my query so late, I had classes and to be honest, a
> little hesitation being a beginner which I just overcame. And I am sure
> that I will make a valuable contribution to the project this summer.

This isn't too late! The application period doesn't open until Monday,
and there's a couple of weeks after that to get your proposal in


  James Aylett, occasional trouble-maker

More information about the Xapian-devel mailing list