[Xapian-devel] Doubt about GSOC proposal

Olly Betts olly at survex.com
Tue Apr 2 02:33:10 BST 2013

On Mon, Apr 01, 2013 at 10:00:04PM +0530, aarsh shah wrote:
> Hello guys.I have begun work on writing my proposal as discussed on IRC and
> will submit a draft in a couple of days so that I can make it detailed and
> refine it after getting feedback.
> I wanted to know about the number of weeks a proposal should cover

It should definitely cover the 14 weeks of the work period (2013-06-17
to 2013-09-23).  It's useful to also sketch out any preparation work
you're planning to do during the community bonding period:


The GSoC timeline doesn't fit with university holidays perfectly in some
countries, so if you have exams, courses, or other commitments which
will eat into the work period much, then it's good if you're able to get
started earlier during the community bonding period, and if you're
planning to do that, you should include the details in your timeline
(both the commitments you have during the work period, and any work
you're planning to do during the community bonding period).

> also,is it okay if I set aside a buffer week somewhere in the middle of the
> summer  for something like cleaning the code,working on the feedback  and
> getting it merged(merging whatever code Ive completed till then.) ?

A better approach is to split the project into a series of smaller
projects, each of which in turn can be implemented (along with
documentation and testcases), debugged, reviewed and merged.

There are lots of advantages to this, both for the student and for
the mentors, for instance:

 * You get more variation of things to work on over time, rather than
   having a large chunk of implementing then a large chunk of reviewing.

 * If things take longer than expected, you can use the experience of
   the early subprojects to adjust estimates for how long the later
   subprojects will take.  And if you don't complete the whole project
   you will at least have managed to merge some useful work.

 * We get to review smaller pieces of code - reviewing a patch twice
   the size is typically more than twice as hard, because it's harder
   to hold twice as much information in your head at once.  It's also
   easier to find time to review a small patch.

 * We don't end up having to try to review and merge all the student's
   work at the same time.

 * You get to actually finish things and get a real sense you're making
   progress, and we can also see your progress much better.  If your
   project plan starts with "write code - 6 weeks" it's difficult for
   us to see how well things are going before 6 weeks are up (and
   probably still hard then), and you'll likely feel you aren't making
   much progress during that time because there aren't any visible
   milestones to tick off.

You need to think about what the sub-projects could be, and arrange
them so that the dependencies work - for example, if you need to
implement tracking some new statistics in the backend (like the
max wdf in each document) for a new weighting scheme you plan to
implement and to evaluate speed and retrieval effectiveness of
compared to the existing default of BM25, then you could split
this into (I'm not saying you'd want to necessarily tackle these
particular things in your project, but a concrete example is much
easier to talk about):

 * Implement max wdf tracking
 * Implement new weighting scheme
 * Evaluate speed
 * Evaluate retrieval effectiveness

The last two could be swapped, but otherwise the order is determined by
dependencies - you need to have max wdf for the weighting scheme, and
you need the weighting scheme to be able to evaluate it.

The first two sub-projects don't seem like they could be split into
smaller usefully self-contained units, but possibly the evaluation steps
could be subdivided into implementing a harness for doing such tests and
using it to perform some tests.

If things don't go to plan and you only manage three of the four
subprojects, you at least have managed to implement a working weighting
scheme and get it merged, and provide some idea of how the speed
compares to BM25.  It's good to think about what's most useful to get
done in such situations when deciding on the exact order.

A good way to think about this is to explicitly make the later parts
"stretch goals", which you can work on if time allows.  That helps to
set expectations clearly.


More information about the Xapian-devel mailing list