Topics

I write on a variety of topics, which can be found below. Click a topic to see all my articles on it.

Individual Topics

california-traffic-data

SWITRS: Increase In Traffic Fatalities After COVID-19 Lock Down
California was put under a stay-at-home order in March, 2020. As expected, traffic volume decreased, but what happened to rate of fatal accidents? They skyrocketed!
Where to Host Public Datasets?
When I released the SWITRS dataset, I had to find a place to host a 5 Gig dataset. Here is what I learned.
Introducing the SWITRS SQLite Hosted Dataset
California traffic collision data has been hard to get, that's why I am now curating and hosting it! Come take a look!
SWITRS: On What Days Do Cyclists Crash?
California crash data doesn't just cover cars, it covers bikes too! This time we look at when cyclists crash in California.
SWITRS: Car Crashes After Daylight Saving Time Ends
Day light saving time leaves leads to more traffic collisions, but what about when DST ends? Some researchers have found that it does lead to more crashes, so I take a look using California's SWITRS data.
SWITRS: Car Crashes After Daylight Saving Time
Day light saving time leaves us drowsy and cranky at work, but it also leads to an increase in traffic collisions! Find out exactly how many more there are with this analysis!
SWITRS: On What Days Do Motorcycles Crash?
Motorcycles riders are a different breed, born to chase excitment! So when do they crash? Using California's SWITRS data I find out! I'll give you a hint: it is not on the way to their 9-5!
SWITRS: On What Days Do People Crash?
What day of the year has the most car crashes? The fewest? Find out as I look at California's crash data! Hint: they're both holidays!
Introducing 'SWITRS to SQLite'
The State of California stores information about all the traffic collisions in the state in the SWITRS database; this script lets you convert it to SQLite for easy querying!

career-advice

The Data Science Spectrum:
From Analyst to Machine Learning

Data science has left the era of the Unicorn and entered the era of the team, but that means there is now a whole spectrum of data science jobs. Here is what they do.
The Data Science Split:
From Unicorns to Teams

When data science started the job covered everything from setting up databases to running experiments to making models. But finding Unicorns was impossible; something had to give.
Data Science Interview Practice: Machine Learning Case Study
A common interview type for data scientists and machine learning engineers is the ML case study. Read on for an example of how I solve them!
Data Science Interview Practice: Data Manipulation
I often get asked how to practice data science interviews, so here is a practice dataset with a set of questions to answer. Good luck!
Get Things Done by Tracking Them
As I've gotten more senior as a data scientist, I've found I have to keep track of more and more things. This is how I do it!
The Gender Pay Gap in Data Science Salaries
How do the salaries of woman data scientists compare to those of men? This month we explore pay by gender and location.
Data Science Salaries
How do data scientists salaries vary by experience and location? Read on to find out!
My Academic Bully at CERN
Getting my PhD was mostly a great experience, but one woman made my life hell for a short time at CERN. It's tough to write about, but I thought I owed it to myself and others.
Should I Go To Insight Data Science?
Insight promises an easy transition from academia to a career in data science or machine learning, but is it the right program for you? I have a few words of advice offered from experience.
The Typical Successful Career Starts with Rejection
When I applied for graduate school I was rejected, not by a few good grad schools, but by all of them! But thanks to a little kindness, I was able to continue onward.
Should I Get a PhD?
In 2009, having little or no money in my purse, I thought I would go to graduate school in physics. But was it the right idea? And should you follow in my footsteps?

childhood-language

Comparison of My Two Sons' Language Development
Being a nerd dad, I recorded all the words my first two sons spoke as they learned them. Now, I compare their language development rate!
My Second Son's Language Development
My second son is a little over two years old. We tracked every word he's spoken to watch his language development, and now you can observe it too!
My Son's Language Development
My son is a little over two and unfortunately he has two huge nerds for parents. We tracked every word he's spoken to watch his language development, and now you can join us!

cycling

Plotting the 2020 Tour de France
The Tour de France is a race decided by mere minutes; to see exactly how those minutes were earned, read on for my plots!
Improving Wikipedia's Tour de France Prize Money Plot
Time to improve another plot from Wikipedia. This time I tackle one showing the prize money in the Tour de France over time!
Plotting the 2019 Tour de France
The Tour de France is a sporting event decided by mere minutes; to see exactly how those minutes were earned, read on for my plots!
Improving Wikipedia's Hour Record Plot
I love Wikipedia, I love cycling, and I love data! So today, I improve Wikipedia's Hour Record Plot! Come take a look!
SWITRS: On What Days Do Cyclists Crash?
California crash data doesn't just cover cars, it covers bikes too! This time we look at when cyclists crash in California.

data-science

SWITRS: Increase In Traffic Fatalities After COVID-19 Lock Down
California was put under a stay-at-home order in March, 2020. As expected, traffic volume decreased, but what happened to rate of fatal accidents? They skyrocketed!
The Data Science Spectrum:
From Analyst to Machine Learning

Data science has left the era of the Unicorn and entered the era of the team, but that means there is now a whole spectrum of data science jobs. Here is what they do.
The Data Science Split:
From Unicorns to Teams

When data science started the job covered everything from setting up databases to running experiments to making models. But finding Unicorns was impossible; something had to give.
Where to Host Public Datasets?
When I released the SWITRS dataset, I had to find a place to host a 5 Gig dataset. Here is what I learned.
Jupyter Notebook Templates for Data Science: Plotting Time Series
Jumpstart your time series visualizations with this Jupyter plotting notebook!
Jupyter Notebook Templates for Data Science: Plotting
Jumpstart your visualizations with this Jupyter plotting notebook!
Jupyter Notebook Templates for Data Science
Jupyter notebooks are great for data exploration; jumpstart your work with this library of useful notebook templates!
The Gender Pay Gap in Data Science Salaries
How do the salaries of woman data scientists compare to those of men? This month we explore pay by gender and location.
Data Science Salaries
How do data scientists salaries vary by experience and location? Read on to find out!
SWITRS: On What Days Do Cyclists Crash?
California crash data doesn't just cover cars, it covers bikes too! This time we look at when cyclists crash in California.
SWITRS: Car Crashes After Daylight Saving Time Ends
Day light saving time leaves leads to more traffic collisions, but what about when DST ends? Some researchers have found that it does lead to more crashes, so I take a look using California's SWITRS data.
Fate Dice: Statistics Testing Is Hard
A few months ago I tested my Fate dice for biases. Now, I retest the "biased" set and see if it really is unlucky! Unfortunately, things aren't so clear...
Fate Dice Intervals
What does a "normal" distribution of rolls from a fair set of Fate dice look like? There are a lot of ways to estimate it. In this post I'll go through four methods.
Fate Dice Statistics
My friends and I played a Fate RPG for over two years. During that time we rolled a lot of dice and developed a lot of superstitions, but were any of them correct?
Visualizing Multiple Data Distributions
Need to compare a set of distributions of some variable? Histograms are OK, but try something fancier! Read on to learn about box, strip, swarm, and violin plots!
SWITRS: Car Crashes After Daylight Saving Time
Day light saving time leaves us drowsy and cranky at work, but it also leads to an increase in traffic collisions! Find out exactly how many more there are with this analysis!
SWITRS: On What Days Do Motorcycles Crash?
Motorcycles riders are a different breed, born to chase excitment! So when do they crash? Using California's SWITRS data I find out! I'll give you a hint: it is not on the way to their 9-5!
Software Testing for Data Science
Much of data science involves writing code; for data cleaning, parsing, and modeling. Software tests can ensure that your code does what you think it does!
SWITRS: On What Days Do People Crash?
What day of the year has the most car crashes? The fewest? Find out as I look at California's crash data! Hint: they're both holidays!
WhereTo.Photo: Using Data Science to Take Great Photos
Where is the best spot to take a photo in San Francisco? Learn how I answered this question with my Insight Data Science project!
Further Double-checking FiveThirtyEight's 2016 Primary Predictions
Is FiveThirtyEight's Polls Plus model biased against any candidate? I continue my double-checking their model by looking at each candidate individually.
Double-checking FiveThirtyEight's 2016 Primary Predictions
How well did FiveThirtyEight do in predicting the primary results? I Double-check FiveThirtyEight's Polls Plus model by comparing its predictions to the outcomes of the 2016 primaries.

data-visualization

Jupyter Notebook Templates for Data Science: Plotting Time Series
Jumpstart your time series visualizations with this Jupyter plotting notebook!
Plotting the 2020 Tour de France
The Tour de France is a race decided by mere minutes; to see exactly how those minutes were earned, read on for my plots!
Jupyter Notebook Templates for Data Science: Plotting
Jumpstart your visualizations with this Jupyter plotting notebook!
Comparison of My Two Sons' Language Development
Being a nerd dad, I recorded all the words my first two sons spoke as they learned them. Now, I compare their language development rate!
My Second Son's Language Development
My second son is a little over two years old. We tracked every word he's spoken to watch his language development, and now you can observe it too!
Improving Wikipedia's Tour de France Prize Money Plot
Time to improve another plot from Wikipedia. This time I tackle one showing the prize money in the Tour de France over time!
Plotting the 2019 Tour de France
The Tour de France is a sporting event decided by mere minutes; to see exactly how those minutes were earned, read on for my plots!
Improving Wikipedia's Hour Record Plot
I love Wikipedia, I love cycling, and I love data! So today, I improve Wikipedia's Hour Record Plot! Come take a look!
My Son's Language Development
My son is a little over two and unfortunately he has two huge nerds for parents. We tracked every word he's spoken to watch his language development, and now you can join us!
Improving An Old Supernova Plot
I learned to use matplotlib more than ten years ago. Around that time, I made a plot of supernova 2002cx for Wikipedia, but it was not terrible good. So this year, I updated it!
Making Animations Quickly with Matplotlib Blitting
Animating plots is great way to show how some quantity changes in time, but they can be slow to generate in matplotlib! Thankfully, blitting makes animating much faster! Learn how to here!
Rise and Fall of Popular Names
The popularity of baby names rises and falls based on the tastes of each generation of parents. Are their preferences the same for boy's names as for girl's names? I plot the trends to find out!
How Fast Does a Raspberry Pi Reboot?
My Raspberry Pis have to reboot every evening to avoid a memory leak. As they say, when you have a memory leak, make animated plots to see how fast they reboot!
Updated Caltrain Visual Schedule
In July, Caltrain updated their weekend schedule to allow time to do track work, so I updated my Marey/Ibry/Serjev visual schedules to see how it changed!
Caltrain Visual Schedule
In 1878, Marey published a famous visual train schedule based on work by Ibry. What would it look like for Silicon Valley's Caltrain? Come find out!
Visualizing Multiple Data Distributions
Need to compare a set of distributions of some variable? Histograms are OK, but try something fancier! Read on to learn about box, strip, swarm, and violin plots!

fun-and-games

Fate Dice: Statistics Testing Is Hard
A few months ago I tested my Fate dice for biases. Now, I retest the "biased" set and see if it really is unlucky! Unfortunately, things aren't so clear...
Fate Dice Intervals
What does a "normal" distribution of rolls from a fair set of Fate dice look like? There are a lot of ways to estimate it. In this post I'll go through four methods.
Fate Dice Statistics
My friends and I played a Fate RPG for over two years. During that time we rolled a lot of dice and developed a lot of superstitions, but were any of them correct?
Dragon Farkle: Simulating the End Game
How many soldiers do you need to successful defeat the dragon in Dragon Farkle, and how likely to succeed is your attack? I find out by simulating a game of Dragon Farkle!

hermes

Python2Vec: Word Embeddings for Source Code
Parsing source code is easy; just let the interpreter do it! But what if you want to recommend code snippets? Then you need word embeddings, like my Python2Vec!
The Nine Must-Have Datasets for Investigating Recommender Systems
Do you want to play around with recommender systems, but you don't have any data? Don't worry, there are tons of great, open source datasets for recommender systems!

interview-prep

Data Science Interview Practice: Machine Learning Case Study
A common interview type for data scientists and machine learning engineers is the ML case study. Read on for an example of how I solve them!
Data Science Interview Practice: Data Manipulation
I often get asked how to practice data science interviews, so here is a practice dataset with a set of questions to answer. Good luck!
Interview Question: What Machine Learning Metric to Use
One of my favorite questions to ask in an interview is "What metric should you use to decide if your model works?". Read on to find out what a good answer looks like!

interviewing

The Data Science Spectrum:
From Analyst to Machine Learning

Data science has left the era of the Unicorn and entered the era of the team, but that means there is now a whole spectrum of data science jobs. Here is what they do.
Data Science Interview Practice: Machine Learning Case Study
A common interview type for data scientists and machine learning engineers is the ML case study. Read on for an example of how I solve them!
Data Science Interview Practice: Data Manipulation
I often get asked how to practice data science interviews, so here is a practice dataset with a set of questions to answer. Good luck!
Data Science Interviews During the 2020 Pandemic
In the middle of the COVID-19 pandemic, I found myself looking for a data science job for the third time in my life. This post covers what I learned.
Interview Question: What Machine Learning Metric to Use
One of my favorite questions to ask in an interview is "What metric should you use to decide if your model works?". Read on to find out what a good answer looks like!
Tech Interviews: Respect Everyone's Time
Interviewing is notoriously agonizing for both the candidate and the company! But it could be much better! Here I propose one guiding principle to make it easier on everyone.

jupyter

Jupyter Notebook Templates for Data Science: Plotting Time Series
Jumpstart your time series visualizations with this Jupyter plotting notebook!
Jupyter Notebook Templates for Data Science: Plotting
Jumpstart your visualizations with this Jupyter plotting notebook!
Jupyter Notebook Templates for Data Science
Jupyter notebooks are great for data exploration; jumpstart your work with this library of useful notebook templates!
Jupyter Notebooks: Not for Development
Jupyter Notebooks are great for a lot of things; development of code is not one of them.

lab41

Matching Cars with Siamese Networks
Matching the same object across separate images is tough, but Siamese networks can learn to do it pretty well! Read on for details.
Object Localization without Deep Learning
Finding objects in images can be hard if you have only a little data. In this post I examine a few approaches that work with few training examples!
Lab41 Reading Group: Swapout: Learning an Ensemble of Deep Architectures
Want to train a network but unsure about dropout vs. stochastic depth? Should you use a ResNet? Stop worry and use Swapout; it does all that and more!
Lab41 Reading Group: Skip-Thought Vectors
Word embeddings are great and should be your first stop for doing word based NLP. But what about sentences? Read on to learn about skip-thought vectors, a sentence embedding algorithm!
Lab41 Reading Group: Deep Residual Learning for Image Recognition
Inception, AlexNet, VGG... There are so many network architectures, which one should you be using? The one everyone else is: ResNet! Come find out how it works!
Lab41 Reading Group: Deep Compression
Deep learning is the future, but how can I fit a battery-drain, half-gigabyte network on my phone? You compress it! Come find out how deep compression saves space and power!
Lab41 Reading Group: Deep Networks with Stochastic Depth
Dropout successfully regularizes networks by dropping nodes, but what if we went one step further? Find out how stochastic depth improves your network by dropping whole layers!
Lab41 Reading Group: Generative Adversarial Nets
What cost function would you use to determine if a picture looks real? How about one learned by another network! Find out more with my summary of Generative Adversarial Networks!
Python2Vec: Word Embeddings for Source Code
Parsing source code is easy; just let the interpreter do it! But what if you want to recommend code snippets? Then you need word embeddings, like my Python2Vec!
The Nine Must-Have Datasets for Investigating Recommender Systems
Do you want to play around with recommender systems, but you don't have any data? Don't worry, there are tons of great, open source datasets for recommender systems!

machine-learning

Machine Learning Deployment: Shadow Mode
Deploying machine learning models is hard; Shadow Mode is one way to make testing a little easier.
Interview Question: What Machine Learning Metric to Use
One of my favorite questions to ask in an interview is "What metric should you use to decide if your model works?". Read on to find out what a good answer looks like!
SAT2Vec: Word2Vec Versus SAT Analogies
Could Word2Vec pass the SAT analogies section and get accepted to a good college? I take a pre-trained model and find out!

my-projects

Jupyter Notebook Templates for Data Science: Plotting Time Series
Jumpstart your time series visualizations with this Jupyter plotting notebook!
Introducing the SWITRS SQLite Hosted Dataset
California traffic collision data has been hard to get, that's why I am now curating and hosting it! Come take a look!
Jupyter Notebook Templates for Data Science: Plotting
Jumpstart your visualizations with this Jupyter plotting notebook!
Jupyter Notebook Templates for Data Science
Jupyter notebooks are great for data exploration; jumpstart your work with this library of useful notebook templates!
Wayback Machine Archiver: Backup Pages with Python
The Internet Archive's Wayback Machine tries to keep a complete copy of the internet. With this script, you can submit pages for effortless indexing.
Eldar: A Bright, High-contrast Color Scheme for Vim
Check out Eldar, my custom Vim color scheme based on elflord. It is a bright, high-contrast theme that looks great in the terminal or GUI!
Introducing 'SWITRS to SQLite'
The State of California stores information about all the traffic collisions in the state in the SWITRS database; this script lets you convert it to SQLite for easy querying!
WhereTo.Photo: Using Data Science to Take Great Photos
Where is the best spot to take a photo in San Francisco? Learn how I answered this question with my Insight Data Science project!

opinions

A Review of Nookdesk's Standing Desk
I bought a Nookdesk standing desk now that I'm working from home; I like it! Read on for a detailed review.
Should I Go To Insight Data Science?
Insight promises an easy transition from academia to a career in data science or machine learning, but is it the right program for you? I have a few words of advice offered from experience.
Should I Get a PhD?
In 2009, having little or no money in my purse, I thought I would go to graduate school in physics. But was it the right idea? And should you follow in my footsteps?
Tech Interviews: Respect Everyone's Time
Interviewing is notoriously agonizing for both the candidate and the company! But it could be much better! Here I propose one guiding principle to make it easier on everyone.

pelops

Matching Cars with Siamese Networks
Matching the same object across separate images is tough, but Siamese networks can learn to do it pretty well! Read on for details.
Object Localization without Deep Learning
Finding objects in images can be hard if you have only a little data. In this post I examine a few approaches that work with few training examples!

python

Python Patterns: Map and Filter
For loops are great, but I am a big fan of replacing them with simple functions. Python provides a couple of building blocks.
Using Travis Build Stages to Test Multiple Python Versions and Publish to Pypi
Often when building packages, we want to test against multiple versions of the language, and then build the package once. I will show you how to accomplish this using Travis Stages.
Python Patterns: @total_ordering
Your classes can make use of the rich Python comparison operators just like the built-in classes. Here I'll show you how to do it while minimizing boilerplate.
Python Patterns: Enum
Things often come in sets of specific items, like states, Pokémon, or playing cards. Python has an elegant way of representing them using enum.
Python Patterns: Named Tuples
Sometimes I need to store an ordered dataset, but reference specific members from it. Named tuples in Python provide a clean way to do this!
Python Patterns: max Instead of if
I often have to loop over a set of objects to find the one with the greatest score. You can use an if statement and a placeholder, but there are more elegant ways!

reading-group

My PhD Thesis, In Short
I graduated from the University of Minnesota in June, 2015. I wrote an esoteric thesis about Z boson decay, which I explain here.
Lab41 Reading Group: Swapout: Learning an Ensemble of Deep Architectures
Want to train a network but unsure about dropout vs. stochastic depth? Should you use a ResNet? Stop worry and use Swapout; it does all that and more!
Lab41 Reading Group: Skip-Thought Vectors
Word embeddings are great and should be your first stop for doing word based NLP. But what about sentences? Read on to learn about skip-thought vectors, a sentence embedding algorithm!
Lab41 Reading Group: Deep Residual Learning for Image Recognition
Inception, AlexNet, VGG... There are so many network architectures, which one should you be using? The one everyone else is: ResNet! Come find out how it works!
Lab41 Reading Group: Deep Compression
Deep learning is the future, but how can I fit a battery-drain, half-gigabyte network on my phone? You compress it! Come find out how deep compression saves space and power!
Lab41 Reading Group: Deep Networks with Stochastic Depth
Dropout successfully regularizes networks by dropping nodes, but what if we went one step further? Find out how stochastic depth improves your network by dropping whole layers!
Lab41 Reading Group: Generative Adversarial Nets
What cost function would you use to determine if a picture looks real? How about one learned by another network! Find out more with my summary of Generative Adversarial Networks!

software-development

Making Custom Markdown for Github Pages
I love Markdown, I take all my notes in it and write my blog in it. But sometimes you want to create new syntax; read on to find out how!
Python Patterns: Map and Filter
For loops are great, but I am a big fan of replacing them with simple functions. Python provides a couple of building blocks.
My Terribly Clever(ly Terrible) Code
When I was young and naive I tried to write very clever code. Here is one of the worst examples.
Using Travis Build Stages to Test Multiple Python Versions and Publish to Pypi
Often when building packages, we want to test against multiple versions of the language, and then build the package once. I will show you how to accomplish this using Travis Stages.
Python Patterns: @total_ordering
Your classes can make use of the rich Python comparison operators just like the built-in classes. Here I'll show you how to do it while minimizing boilerplate.
Python Patterns: Enum
Things often come in sets of specific items, like states, Pokémon, or playing cards. Python has an elegant way of representing them using enum.
Python Patterns: Named Tuples
Sometimes I need to store an ordered dataset, but reference specific members from it. Named tuples in Python provide a clean way to do this!
Python Patterns: max Instead of if
I often have to loop over a set of objects to find the one with the greatest score. You can use an if statement and a placeholder, but there are more elegant ways!
Software Testing for Data Science
Much of data science involves writing code; for data cleaning, parsing, and modeling. Software tests can ensure that your code does what you think it does!
Eldar: A Bright, High-contrast Color Scheme for Vim
Check out Eldar, my custom Vim color scheme based on elflord. It is a bright, high-contrast theme that looks great in the terminal or GUI!
Jupyter Notebooks: Not for Development
Jupyter Notebooks are great for a lot of things; development of code is not one of them.