Jupyter Notebooks: Not for Development

An image of the planet Jupiter showing the Great Red Spot.

Jupyter Notebooks are great! They make it really convenient to tinker with a new library and are excellent for documenting projects that include code. What Jupyter Notebooks are not great for, and what I find many people (including my lab) using them for, is development. There are three reasons for this:

1. Version Control

Version controlling notebooks is a mess; in addition to code, they also contain data and output, which results in a high number of changes every time the notebook is run. Worse, even if the output is identical, things like cell numbering update every run and so flag the notebook as changed to the version control system. Further, JSON is already hard to diff, and adding these superfluous changes makes it harder still.

2. Modularity

Other code can not easily call code defined in notebooks. This leads to lots of duplicated code, and means that notebooks need to either appear at the end of the pipeline or write to disk to pass on data. This lack of modularity also makes it difficult to write unit tests to verify the correctness of notebook code.

3. Complex History

Notebooks have complicated history; they cache the results of previous cells including set variables. Notebooks are so flexible that you will often add and delete cells when working on them, leaving you in a state with impossible to remember history. This means that unless you have run the notebook from a fresh kernel it is possible that the results are dependent on now deleted cells.

Good Uses

This is not to say that the use of Jupyter Notebooks should be considered harmful; they are great for:

So in closing: I use Jupyter Notebooks where they excel—like documenting analyses for this blog or tweaking algorithms for WhereTo.Photo—and try to stick to pure code for other cases.