SWITRS: On What Days Do People Crash?

A black and white photo of about a dozen men and boys standing around a broken car taken in Washington D.C. in 1923. One of the car's wheels has splintered and the car is tilted over.

The Statewide Integrated Traffic Records System (SWITRS) contains a wealth of information, enough to determine who, where, when, and sometimes why and how for every traffic accident in California. Today, with the assistance of my SWITRS-to-SQLite script (discussed previously), I’m going to look at when accidents happen, and specifically on what dates.

As always, the Jupyter notebook used to do this analysis can be found here (rendered on Github).

Data Selection

The data was selected using the following query:

SELECT Collision_Date FROM Collision
WHERE Collision_Date IS NOT NULL
AND Collision_Date <= '2015-12-31'  -- 2016 is incomplete

This selects every accident that happened before 2016 that has a collision date stored. The current year, 2016, is excluded because the data from it is incomplete.

Accidents per Week

The first thing to look at is crashes as a function of time. Below, I plot accidents per week to make the trends clearer; plotting per day results in too many points to separate by eye.

Line plot showing accidents per week from 2001 to 2015

The week-to-week variation is rather significant, but two major trends are obvious:

  1. The total number of accidents has been decreasing over the past few years, with a big drop in 2008, but is now rising sharply in 2015.
  2. Each year is similar, with a mid-year lull and wildly varying increases and decrease right before the end of the year.

The first trend is easy to explain: the Great Recession put many people out of work, who then stopped commuting. The second trend is also due to reduced driving; we’ll look at it in detail below.

Day-by-Day

To explore the second trend, we’ll need to look at the data day-by-day instead of a week at a time. Below is a plot of the average number of accidents on each day of the year. The average is calculated by summing the number of accidents on a specific day (say, September 22nd) across the years 2001 to 2015. The sum is then divided by the number of times that specific day appeared in the timespan (15, except for the leap day, which only appears 3 times).

Line plot showing average accidents by day of the year

Holidays account for the extrema, with the minimum number of accidents taking place on Christmas, and the maximum number taking place on Halloween. In fact, many of the local maxima and minima are also holidays! Some create obvious, multi-day patterns (like Thanksgiving) because they are floating holidays while others (like Christmas and Halloween) create massive, single-day dips or spikes because they happen on the same day every year. Holidays that fewer people get off from work, like Washington’s Birthday and Columbus Day, show almost no deviation from the surrounding dates.

But perhaps the most interesting dates are the holidays where the number of accidents increases! Halloween is the most obvious of these, and sets the record for the highest number of accidents, but Valentine’s Day and St. Patrick’s Day also show increases. I believe there are two reasons these holidays have higher than normal accident counts. First, these are not generally paid holidays, so the normal number of commute-related accidents happen. Second, these are holidays are celebrated away from home after work (for drinks, dates, or candy), and so people drive more on these days than they would otherwise. I suspect that there is a third reason behind Halloween’s high accident count: a higher than average number of pedestrians being out and about leading to a higher than average number of accidents involving pedestrians. I plan to look at pedestrian accidents in a later blog post.

Day of the Week

Finally, let’s look at accidents by day of the week. On weekends, like holidays, we would expect most people to not go to work. Below is a violin plot of accidents by day of the week. The width of each “violin” indicates the number of days with that value while the center line indicates the median, and the two outer lines indicate the interquartile.

Violin plot showing accidents by day of the week

The distribution for each day of the week is bimodal. This is due to the two plateaus in accident rates: a high one from 2001-2006, and a lower one from 2011–2014. The first four weekdays have roughly the same number of crashes. Friday has more, presumably because people are more likely to go out after work. Saturday drops to a level slightly below the weekdays, though not by much, and Sunday has the lowest accident count.

In the end the results are not too surprising: accidents happen when people are driving, not when they’re sitting at home celebrating!


Update: Replaced the univariate dot plot in the Day of the Week section with a violin plot.