R bindings for Xapian - Project progress

James Aylett james-xapian at tartarus.org
Mon May 30 16:35:15 BST 2016


On Mon, May 30, 2016 at 10:47:45AM +0530, Amanda Jayanetti wrote:

> I developed the basic structure of xapian_index() function which enables
> the content of a data frame to be indexed with Xapian search engine
> library. I pushed it to a git repository at
> https://github.com/amandaJayanetti/RXapian.
> It'd be a great favour if Mr. Dirk Eddelbuettel as well as other interested
> developers could kindly review the function and give me some feedback on it.

Hi, Amanda -- I don't have R installed on the computer I have with me,
so I can't play with it and give any feedback on the API; hopefully
Dirk and others can jump in on that. But this feels like a good start
against your project plan.

There are a few little things I think you can tidy up on non-R issues.

1. You don't need to commit autom4te.cache, config.log or
config.status (these are all working files that are part of the
autoconf system). If they're in the repository, then (a) they'll
change whenever the underlying files change, and (b) people will have
to download them even though they'll be regenerated locally after that.

2. You almost certainly don't want to commit configure, since
configure.ac is committed. (I assume that a simple run of autoconf
will regenerate configure? It looks like it will.)

3. You still have the Read-and-delete-me file in the repo, which I
assume was created automatically by Rcpp or similar. If you think that
preserving any parts of that are helpful to end users, or to yourself
while developing the extension, I'd add them to README.md.

4. Rather than keeping lots of 'Delete <file>' commits, it's possible
to 'squash' multiple commits together. I think it'd be worth doing
this for most of the commits you currently have, so there's an initial
commit and then the later 'real' changes (830d3c2 onwards). You can do
this using git rebase -i; if you haven't used it before then there are
tutorials online, but if you get stuck then drop me an email with some
times you can be online and I'll walk you through it. (It can be
confusing the first time; git isn't always the most friendly of tools!)

5. Those later changes should have meaningful commit messages. 'Update
<filename>' is just restating something that should be clear from the
commit itself. I wrote something about commit messages in the
developer guide
(https://xapian-developer-guide.readthedocs.io/en/latest/contributing/workflow.html#good-commit-messages),
which has a further link out to an article I've found helpful on this.

6. In README.md, you can mark code by surrounding it by backticks
(`...`). With github-flavoured markdown (which is what github will
render README.md as) you can also put 'code fences' around it, lines
that just contain three backticks (```). (GFMD understands R, so you
can use ```R as the opening fence, and it'll syntax highlight too.)

7. In configure.ac, you should default to xapian-config, not
xapian-config-1.3; anyone who's using a development version can point
XAPIAN_CONFIG to the relevant binary. It isn't a huge issue right now
(since you're not targetting Xapian 1.2), but 1.4 should be out during
the course of GSoC, and we'll want RXapian to support that without
people having to do anything special.


Going forward, github provides some helpful tools for discussing
changes if you work on a branch and then create a pull request (back
to your own repo for the time being). The discussion tools don't
really work as well unless you're using pull requests, unfortunately,
but it should be an easy habit to get into (and is how a lot of open
source projects work).

J 

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list