Jupyter Notebook Templates for Data Science: Plotting
I recently released my Jupyter Notebook Template Library. Its goal is to accelerate your data science projects without having to to spend hours poring over old notebooks to find handy code snippets. In this post I dive into the plotting notebook to show you what it can do.
The Plotting Notebook
Visualizing your data is a critical step in understanding it, and so it is appropriate that the first notebook in the library helps with making beautiful plots.
The notebook begins with boilerplate code that defines metadata for the resulting files and also changes some defaults, such as the figure size and resolution, font size, and legend frame. After that there are a few helpful functions which I will discuss below.
One of my favorite functions is
draw_bands(). It draws a set of alternating colored bands on the background of the plot based on the axis tick locations.
When called with just the axis, like
draw_bands(ax), it produces this:
But you can also customize the color using
draw_bands(ax, color="orange", alpha=0.05), which produces:
These bands are a subtle way of indicating where on the X-axis a point lies, which is especially useful when plotting a time series. I use them often. Here are some examples:
- Discussing my sons’ language development to highlight each month.
- Plotting the progression of the cycling hour record to show each decade.
- Exploring when cyclists are involved in traffic accidents to highlight the seasonality.
I like minimal, but informative, legends. Color alone is often enough to differentiate lines or points, so I wrote a function to change the color of the legend text to match the line, called
draw_colored_legend(). It produces a legend like on this plot:
This function is a little brittle; it works well for scatter plots, but fails for some other Matplotlib objects. I plan to make it more robust in the future.
This legend style can be seen in these posts:
- Plotting my son’s language development to label each language.
- Plotting Tour de France Prize Money to label the winner’s prize compared to the total.
- Comparing Data Science Salaries by Gender to differentiate the points for men and women.
Putting It Together
The plotting notebook enables you to make beautiful plots quickly and easily. For example, this plot:
Was produced by this short code snippet:
fig, ax = setup_plot( title="Title", xlabel="X-axis", ylabel="Y-axis", ) ax.scatter(np.random.rand(500)-0.65, np.random.rand(500), label="First dataset") ax.scatter(np.random.rand(500)-0.35, np.random.rand(500), label="Second dataset") draw_colored_legend(ax) draw_bands(ax) save_plot(fig, "/tmp/output.svg")
If the notebook template library is useful to you, be sure to let me know on Twitter or Github. Your feedback helps make the project better for everyone!