Letor stabilisation - project progress

James Aylett james-xapian at tartarus.org
Mon Jun 6 10:39:14 BST 2016


On Mon, Jun 06, 2016 at 01:13:14PM +0530, Ayush Tomar wrote:

> I have completed introducing some code from v-hasu's branch into mine,
> mainly for Features, FeatureVector and FeatureManager classes. I have
> pushed the changes to https://github.com/ayshtmr/xapian/tree/letor-update. I
> am now proceeding to write unit tests for feature modules.

Great!

> There are a few things that I wanted to clarify:
> 
> 1. I have introduced a lot of code in a single commit (
> https://github.com/ayshtmr/xapian/commit/2abaf6b8d64b1624699c3f490b6dc441e1c6cdf3
> )
> Is it fine or should I try to split things up?

A good rule of thumb is to aim for a maximum of 800 line
changes. (This is based on observations of how long it takes to review
code, and how long people can be effective at reviewing code.)
Obviously for complex things, you want to aim somewhat lower.

I think in this case we'll want to split the commit up in order to
review it, and then squash it all down again when we merge. (That
happens sometimes.)

For instance, there's a rename of getlabel -> get_label which you
could pull out into one commit (which is then a fairly straightforward
commit to check, since providing the end result of the PR compiles it
will have caught all the cases). Probably the easiest way of doing
that is to do something like the following:

 * interactive rebase and mark that commit for edit
 * when you're dropped into the shell, use git reset --soft HEAD^ to
   get the commit into the stage
 * use git reset -p to remove everything but the getlabel rename lines
   from the index
 * git commit (the rename changes)
 * git add everything else back in and git commit that (or, more
   likely, continue to split into further commits)
 * git rebase --continue when you're done

I think if you look carefully at that commit you'll also discover that
the letor_features.cc changes contain a lot that's unwanted whitespace
changes and should be dropped from the commit. (-b to git show and git
diff can be helpful here, as it ignores whitespace changes. If it
isn't there but shows up as an option when you're doing git add -p
then it's a sign something's up, probably with your editor
configuration.)

> 2. There is a lot of commented code in my branch right now that
> needs to be removed. Is it wise to take care of it with each commit
> I push or should it be done later, say, when some stability is
> reached (post mid-term?)

I'd do it sooner rather than later, because it'll make the commits
more manageable and also the files smaller, which is easier for you to
deal with.

What I'd do is to create fixup commits that remove the commented code,
tied to whichever commits you want to get that removal into. Then you
can squash them before we merge. (That's considerably easier for you,
because you can focus first on getting the commit splits right, and
then tidy up each one.)

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list