I review all the books I read, here is a selection of my favorites that I read in 2023:
Blindsight does a great job of exploring the nature of consciousness and intelligence. Watts keeps the tension high and the plot moving quickly in this thought-provoking sci-fi novel. My favorite book of the year!
Blindsight is a hard sci-fi novel about first contact with aliens in the near future. A crew of four transhumans and a vampire are sent on a spaceship to investigate an anomaly in the solar system after a swarm of alien probes scan Earth.
Hamilton is known for his space opera, but The Abyss Beyond Dreams is more urban fantasy set during the Russian Revolution (in space) and Night Without Stars is a thriller set during the Cold War (again, in space). Both feature Commonwealth citizens with special knowledge as “Outside Context Problems”, pulling the stories into science fiction territory.
The Abyss Beyond Dreams starts off The Chronicle of the Fallers, another series in Hamilton’s Commonwealth universe. Though billed as space opera, it often reads more as urban fantasy since most of the story occurs on the planet Bienvenido inside the Void where steam engines are their most advanced technology.
Night Without Stars is the second book in the Chronicle of the Fallers. It is action packed, with great pacing, and complex characters. It is my new favorite Hamilton book.
I enjoyed the sequel to Hyperion the most of the two books because it tied the personal story of the pilgrims to a much broader galactic conflict. Interestingly, you can see a lot of ideas in the Hyperion Cantos that Hamilton later adopted in his Commonwealth Saga including wormholes, a breakaway-but-helpful AI, and different factions of scheming AI who either want to eradicate the humans or uplift them.
The Fall of Hyperion is a sequel that outshines it’s predecessor. It is everything I was expecting from Hyperion and more! A true masterpiece.
Epic space opera with a massive cast of characters and incredible pacing.
I couldn’t put Pandora’s Star down! It is a sci-fi book that reads more like a thriller. There were always new mysteries that just a few more pages promised the answers to.
The sequel to Pandora’s Star, Judas Unchained continues right where the last one left off, but with the action ramped up to 11. The various storylines and loose threads come together one by one until it’s the good guys racing against the bad guys for the fate of the universe.
1980s mech sci-fi re-imagined for the 21st century. Warren’s self-published series takes a few books to really find its feet, but once it does, it’s a quick, fun, nostalgic read. The third book, Serpent Valley, exemplifies the series.
Serpent Valley, the third book in the War Horses series, is another quick, action-packed read—but without the flaws holding back its predecessors. Easily my favorite of the series so far!
This is straightforward when the officer writes FORD
or HONDA
, which they
mostly do. But since the officer can write anything, they occasionally make it
a little harder on us by abbreviating or mistyping, for example VOLX
and
DODDGE
. And sometimes they make it impossible by writing --
or ______
.
The solution is to go through, one by one, and create a mapping like:
# Enumeration of common vehicle makes
@unique
class Make(Enum):
CHEVROLET = "chevrolet"
GMC = "gmc"
HINO = "hino"
INFINITI = "infiniti"
MITSUBISHI = "mitsubishi"
# Special Token for unknown make
NONE = None
# Dictionary mapping raw values to Make enum
make_map = {
"CHEVRLT": Make.CHEVROLET,
"HINO/": Make.HINO,
"INFINITY": Make.INFINITI,
"MITSUB": Make.MITSUBISHI,
"TAHOE": Make.GMC,
"UKNOWN": Make.NONE,
}
As someone who did this mapping by hand for over 900 entries, it is quite tedious. Fortunately, making sense of mangled text is something Large Language Models (LLMs) are pretty good at!
The goal is to perform few-shot, multi-label classification of vehicle makes. Few-shot because we are going to give the model just a handful of examples of what output we expect, and multi-label because there are many possible vehicle makes it will have to map to.
The first step is to write a prompt explaining the task to the model, the expected return value, and a few examples of input and correct outputs. Here is a shortened version, the full one is here, starting with the instructions:
I am working with a dataset of traffic collisions from California. One of the fields is the “make” of the vehicle, for example, “Honda”, “Ford”, “Peterbilt”, etc.
But this field a free-text field filled out by the CHP officer on the scene of the collision. As such there are misspellings, abbreviations, and other mistakes that have to be fixed.
I have created a set of makes as follows (including
NONE
as a placeholder for unknown values). Here is the list in a PythonEnum
:@unique class Make(Enum): ACADIAN = "acadian" ACURA = "acura" ALFA_ROMERO = "alfa romera" AMC = "american motors" ...
Take note that anything unknown should be tagged with
Make.None
. And do not make up new Enum values.
Then the output format, with instructions to include an explanation of its logic first, which can help model accuracy:
I will provide you with a string. You are to return a Python dictionary with the following keys, in this same order:
{ explanation: "An explanation of why you think the enum value is a good match, or why there is no match possible.", input_string: "The input string", enum: "The correct enum from above", no_match: "`True` or `False`. True if there is no matching enum or no way to make a match, otherwise False.", }
And finally some examples of inputs and correct outputs:
For example, for the input
VOLX
:{ explanation: """VOLX is pronouced similarly to 'Volks' and therefore this is probably an abbreviation of 'Volkswagen'. There is an enum value for Volkswagon, `Make.VOLKSWAGEN`, already so we use that.""", input_string: "VOLX", enum: make.VOLKSWAGEN, no_match: False, }
Since I was manually copying the prompt into the model’s web interface, I used batches of 100–200 string sorted alphabetically. With API access, I could have used retrieval-augmented generation to create custom examples for each string while sending them one at a time.
Splitting the data into batches helped the model figure out very short
entries. For example, the model failed when given WNBG
(Winnebago) by
itself, but succeeded when I gave it the list:
WINN
WINNE
WINNEBAG
WINNEBAGO
WINNI
WNBG
WNBGO
I believe seeing multiple short versions next to eachother helped the model infer the right mapping.
I obtained the following performance on my 902 hand-mapped entries:
This is reasonably good performance, as finding wrong entries is pretty quick (and many could be fixed with find and replace).
The “make” of a vehicle is the brand of the manfacturer, like ‘Honda’, ‘Ford’, ‘Tesla’, etc. ↩
We’ve now put 1850 miles on the bike, mostly taking the kids to local parks. After all the time spent with it, would we recommend it to another family?
In short: we love the utility an electric cargo bike offers, and I think we will always have one in the garage to supplement our minivan, but the RadWagon has some significant draw backs that make recommending it difficult. Read on for the full review.
We ordered the Rad Wagon from the Rad’s website for $1500. With a “caboose” enclosure and pads for the kids adding $250, plus tax, our total was $1901—not cheap, but almost a third of comparable cargo bikes which are in the $4000–6000 range.
The bike comes in a box and you must assemble it yourself. The assembly was not too hard, but I also have a lot of experience with bikes and bike maintenance which helped.
The seat post and headset are highly adjustable, allowing people of varying heights to ride the RadWagon. Both my wife (5’4”) and I (6’1”) can ride it comfortably, but we’re both near the limits. I have the seat all the way up and she has it all the way down.
The 750W rear hub motor easily brings the bike to its 20 miles per hour computer-limited top speed. The bike has a throttle, which I love for getting started from a stop, and enough power to carry me, two kids, and some balance bikes up a steep hill. The downside though is the hub motor puts a lot of stress on the rear wheel.
A flaw in the design is the rear brake. The RadWagon uses cheap, mechanical disc brakes, which are enough to stop the bike when they’re well aligned, but which need constant attention to keep them that way. The motor blocks the typical through-spoke access for adjusting the rear brake. Instead, Rad makes a special flat Allen wrench that fits between the brake and motor but adjusting remains hard.
A major flaw is the rear spokes. They are stressed by both the motor—which puts all its power through the rear wheel—and the cargo. The spokes were not tight enough from the factory and I broke several in the first 200 miles. I have broken fewer since replacing and retightening, but I still break one periodically, which is annoying for me and likely a dealbreaker for less experienced riders.
I’ve contacted Rad’s support several times—to order spokes and the brake tool, and to replace a faulty accessory. They were generally quick and helpful but I’ve never needed support for other bikes. And the last interaction was terrible: Rad’s front basket was defective, and after I sent them photographic proof, they accused me of being unable to use a screw driver and stopped responding. It was the worst support experience I’ve ever had.
The RadWagon was a savior during the pandemic, letting us escape the house and ride when we’d otherwise be trapped inside. It also beats driving kids in a car—hauling them on the back makes getting to the park part of the fun.
But the bike needs constant maintenance that is difficult even for an experienced mechanic, and Rad’s support is not great. I wish I had purchased a higher-quality bike that wouldn’t fail so frequently.
]]>You can probably already see the problem: APIs are meant to hide the inner workings behind them. But by returning the raw model scores, we revealed too much detail. Any changes to the model, like retraining it, could change the scores and break the front end.
In my guide to deploying machine learning models in shadow mode, I stated that deploying changes “in front of the API” has the advantage of giving the calling team control. This is precisely why we built the ATO API the way we did: to address the organizational issue that the engineering team did not trust the machine learning team.
But if your teams trust each other, there is a much better way to build.
A better way is for the API to return a set of actions. For example, the ATO model API might return the following actions:
These actions do a really good job of hiding the implementation behind the API. You can freely change thresholds when the model performance changes, retrain the model, or even replace it entirely.
But you can do something else too, you can add more models!
A common fraud-prevention strategy is to train a model for each new fraud pattern identified. This allows each model to be highly precise, while also improving the recall of the overall system. These multi-model systems are often augmented with simple rules, such as “No logins from Russia allowed.” In the end, the system takes the outputs of the various models and rules and aggregates them in some way. In our ATO example, the system returns the most drastic action recommended by any model or rule.
In code:
def ato_api(event_token):
# List of actions returned by all the models and rules,
# consists of values from {'Allow', 'Step-up', 'Lock'}
all_results = get_ato_system_results(event_token)
if 'Lock' in all_results:
return 'Lock'
elif 'Step-up' in all_results:
return 'Step-up'
return 'Allow'
Of course, this is a great place to use enums and max:
from enum import IntEnum, unique
@unique
class Action(IntEnum):
ALLOW = 0
STEPUP = 1
LOCK = 2
def ato_api(event_token):
# List of actions returned by all the models and rules,
# consists of values from Action() enum
all_results = get_ato_system_results(event_token)
return max(all_results)
But in the last 7 years language models have gotten much, MUCH better. I wondered how a state-of-the-art model, one too large to run on my computer, would perform on the same questions.
To find out, I ran the analogies through Anthropic’s biggest model: Claude.
I gave Claude the following instructions:
We’re going to solve SAT analogy questions. I’ll give you a pair of words like:
“authenticity : counterfeit”
And you determine the relationship between the two words, and then pick the pair from the next 5 with the same relation. So in this case I would give you:
reliability : erratic
mobility : energetic
argument : contradictory
reserve : reticent
anticipation : solemn
You need to pick:
“reliability : erratic”
Make sure to explain the relationship and why you chose the answer you did.
Then I gave it an analogy like:
fix : stable
------------------------
fortify : strong
captivate : attractive
deter : active
furrow : productive
mend : torn
And it would reply with its answer and reasoning (click “Show” to see all of it):
Okay, for this analogy:
agenda : conference
The relationship here is:
An agenda is a list that provides the plan or schedule for a conference. So a conference relies on and uses an agenda.
Based on this, the choice that shows a similar relationship is:
map : trip
Okay, for this analogy:
agenda : conference
The relationship here is:
An agenda is a list that provides the plan or schedule for a conference. So a conference relies on and uses an agenda.
Based on this, the choice that shows a similar relationship is:
map : trip
Because a trip relies on and uses a map. A map provides the plan and guidance for a trip, just as an agenda does for a conference.
The other choices do not show this same kind of relationship:
- Teacher : class - A teacher facilitates a class but a class does not rely on or use a teacher in the same direct way that a conference uses an agenda or a trip uses a map.
- Executive : employee - An executive oversees employees but employees do not directly rely on or make use of an executive the same way a conference uses an agenda or a trip uses a map.
- Man : woman - These do not have a direct facilitator/user relationship. They do not rely on each other in the same way.
- Agency : assignment - While an agency may give out assignments, an assignment does not directly rely on or use an agency in the way a conference uses an agenda or a trip uses a map.
So by analyzing each choice, map : trip is the one option that shows the closest relationship to the original pair’s dynamic of agenda : conference. Please let me know if this makes sense or if you have another analogy challenge for me!
One potential issue: I left the analogies and their possible answers in the sorted order from my word2vec post. This means the analogies word2vec got right were up first, with the correct answer in the first position. Then the ones word2vec had the correct answer second, and so on. I randomized a few of the analogies to check if this was a significnat problem, and it did not seem to be. If I had API access, I would have randomized the order of all the potential answers and run multiple tests.
Unsurprisingly, Claude did much better than Word2Vec. It got 25 out of 35 analogies correct1 compared to 8 out of 36 for Word2Vec.
The most frequent failure appears to be Claude identifying the right relationship, but then incorrectly applying it to choose the answer. For example this analogy (correct answer bold, Claude’s answer in green underline):
hospitable : courtesy |
---|
vindictive : spite |
despondent : glee |
leisurely : haste |
morbid : cheerfulness |
infamous : honor |
With the following logic:
Okay, for the pair:
hospitable : courtesy
The relationship is:
Someone who is hospitable shows courtesy or kindness to their guests. So there is a relationship where one concept (hospitable) leads to or results in what the other represents (courtesy).
Based on this, the choice that shows a similar relationship where one concept results in what the other represents is:
despondent : glee
Because someone who is despondent lacks glee or cheer. Like hospitable leads to courtesy, despondent precludes glee.
Claude correctly identifies that hospitable implies showing courtesy, but then picks the opposite relation, someone is despondent lacks glee.
All of Claude’s answers are here.
I used one analogy in my instruction to Claude, which explains the discrepancy between 35 and 36. ↩
I gave an example in my last post of a good application for LLMs: editing prose. But what specifically makes this problem ideal for solving with a model? Succinctly, it is a problem where solving it is hard, but verifying the solution is easy. I will go into more detail in the rest of this post.
In math, there are a types of problems where finding a solution is difficult or impossible, but confirming a solution is easy. A common strategy to solve these problems is to guess the solution’s form and then verify it, such as for an integral where the solution can be checked by taking its derivative.
Large language models are particularly useful for exactly these types of tasks: where generating a solution is hard, but verifying it is easy. Editing a paragraph is a prime example of this kind of task since writing multiple versions is time-consuming, whereas verifying the quality of a single paragraph can be done quickly.
Another good use case is writing code, especially if you have tests in place to verify the code’s correctness.
LLMs are bad for problems where verification is hard compared to the generation of an answer.
Some people are using LLMs as a replacement for search engines. This is a perfect example of a bad use of the technology because verifying the accuracy of the information provided by the model takes time and effort. In fact, it often involves additional searches to confirm the validity of the answer, which defeats the purpose of using an LLM in the first place.
]]>But ChatGPT is an incredibly useful tool for my writing due to its ability to rapidly iterate through alternative phrasings for ideas, sentences, and paragraphs. In this cases, the model’s habit of generating false information is not a concern because I wrote the original input so I know what it should say, and I can easily edit out any inaccuracies introduced by the model.2, 3
Let me illustrate how I use ChatGPT to write and copyedit my work.4
I start by writing a rough draft which may consists of a few bullet points or a detailed set of paragraphs. Next, I give it to ChatGPT for refinement.
Give me three versions of this paragraph:
I start by writing a rough draft. This draft could be a just a few bullet points or completely fleshed out paragraphs. I then feed it into ChatGPT:
- I begin my writing process with a rough draft, which may range from a simple list of bullet points to fully developed paragraphs. Next, I input it into ChatGPT.
- My writing method starts with a rough draft - it could be a brief outline or a detailed set of paragraphs. Afterwards, I pass it to ChatGPT.
- To start, I create a preliminary draft - it may consist of a handful of bullet points or fully written paragraphs. Afterwards, I utilize ChatGPT to refine it further.
I don’t usually take the entire output from ChatGPT, rather I use it to inspire my own editing:5
I start by writing a rough draft which may consists of a few bullet points or a detailed set of paragraphs. Next, I give it to ChatGPT for refinement.
I iterate using this process, refining my text by sending the sections I’m unhappy with to ChatGPT for improvement until I’m satisfied.6
ChatGPT is the new large language model from OpenAI. It has garnered a lot of attention for its wide range of abilities from conversation to writing songs to information retrieval. It has also drawn criticism for its habit of making up plausible but completely wrong facts, leading many to conclude that while it is entertaining it is not useful.
1. OpenAI's ChatGPT is a highly advanced language model that has received widespread attention for its diverse abilities, including conversation, songwriting, and information retrieval. However, its tendency to generate false information that appears credible has led to criticism and the conclusion that, although entertaining, it may not be practical.
2. ChatGPT, the new AI model from OpenAI, has captured considerable attention for its wide range of skills, such as conversation, songwriting, and information retrieval. Despite this, its propensity for producing believable but false facts has sparked criticism, and many view it as entertaining but not practical.
3. OpenAI's ChatGPT, a new and sophisticated language model, has drawn a lot of attention for its versatility in conversation, songwriting, and information retrieval. Despite this, it has also faced criticism for generating seemingly credible but completely false information, leading many to see it as entertaining but lacking practicality.
I find ChatGPT to be incredibly useful as part of my writing process, because ChatGPT makes it fast and easy to generate different possible ways to write down an idea, sentence, or paragraph. In these cases the fact that the model makes things up is irrelevant because I wrote the original source material so I can filter out any inaccuracies the model introduces.
1. I consider ChatGPT to be a valuable tool in my writing process due to its ability to quickly and effortlessly generate various ways of phrasing an idea, sentence, or paragraph. In these cases, the fact that the model generates false information is not a concern as I am the originator of the source material and can easily identify and remove any inaccuracies introduced by the model.
2. For me, ChatGPT is an extremely useful aspect of my writing as it facilitates the rapid creation of alternative phrasings for ideas, sentences, or paragraphs. The model's tendency to generate false information is inconsequential in these scenarios as I am the source of the original material and can easily filter out any inaccuracies introduced by the model.
3. ChatGPT is an indispensable part of my writing process as it enables me to quickly generate various ways to express an idea, sentence, or paragraph. Its tendency to produce false information is not a problem as I am the writer of the original source material, and I can easily eliminate any inaccuracies introduced by the model.
I have used 'tendency' too much already. Rewrite this five times, each time only replacing the word 'tendency' with another few words to maintain the meaning:
In these cases, the model's tendency to generate false information is not a problem because I wrote the original input so I know what it should say, and I can easily edit out any inaccuracies introduced by the model.
1. In these cases, the model's habit of producing false information does not pose an issue because I created the original input and can easily remove any inaccuracies it may introduce.
2. In these scenarios, the model's inclination to generate false information is of no concern as I am the writer of the original material and can effectively edit out any inaccuracies introduced by the model.
3. The model's propensity for generating false information is not a concern in these instances as I am aware of the intended content of the original input and can efficiently remove any inaccuracies it may introduce.
4. The model's tendency to produce false information is not a hindrance in these situations as I am familiar with the original input and can effortlessly edit out any inaccuracies it generates.
5. The model's habit of creating false information is not problematic because I wrote the original input and can easily eliminate any inaccuracies the model introduces.
Five versions of:
Let me show you how I use it to write and edit my work.
1. Allow me to demonstrate how I utilize it in writing and revising my pieces.
2. Let me illustrate how I employ it in the creation and modification of my writing.
3. I will show you how I use ChatGPT for writing and editing my writing projects.
4. Let me reveal how I integrate it into my writing and editing process.
5. I will demonstrate to you the way I utilize it for writing and revising my work.
Three versions of:
I generally don't take the full paragraph, but I use ChatGPT's versions to get ideas for my own:
1. I don't usually use the entire output from ChatGPT, but I do use it as inspiration for my own writing.
2. Typically, I don't adopt ChatGPT's output in its entirety, instead I draw ideas from it for my writing.
3. My approach is not to fully adopt the ChatGPT output, but rather use it to spark ideas for my own writing.
Give me three versions of this:
I iterate on this process, taking parts of my text I'm not happy with and sending them to ChatGPT, until I like how the work flows.
1. Revising my text with ChatGPT's help, I improve it piece by piece until I'm satisfied with the flow.
2. With the aid of ChatGPT, I repeat the process of refining my text by sending unsatisfactory portions for improvement until I achieve the desired flow.
3. I use ChatGPT to continuously refine my text, sending sections I'm unhappy with for improvement until I'm content with the overall flow.
I come down on the side of AI-artists.
This is probably unsurprising because I am a machine learning engineer, it is my job to build the types of systems these artists are using. But what is less obvious is that my support is also because I am an artist, specifically a landscape photographer.
Photography—just like AI-generated art—has a complicated history as “art”. Although the first photograph was taken in 1826, it wasn’t until 1924 that an American museum recognized the medium as art by including photographs in its permanent collection. At first artists feared photography would replace traditional visual arts due to the ease of taking a picture. But eventually they realized it was a useful tool that could be combined with other art forms, even if they did not recognize photography as an art in its own right.3, 4
The concerns and criticisms currently being directed towards AI-generated art are the same as those leveled against photography in the past. And just as photography eventually gained acceptance as a valid form of art so will AI-generated art. The resistance against it may be strong, but ultimately, it is a losing battle.
My family has a long history of painting. My great-great-great grandfather was the Norwegian landscape painter Hans Gude. My father, also named Hans Gude, was an accomplished oil painter.5 I too wanted to make art, but I did not have their skill with a brush so I picked up a camera instead.
I was drawn to photography specifically because it used technology. I like learning new technologies and how to master them. I also thought it would be easier to make art I was happy with using a camera. I have since learned that photography has its own set of skills to master, but after 15 years I think I was mostly right: it is much easier than oil painting.
I wonder what my great grandfather would think of my art. He spent months or years creating his seascapes, while my photographs are captured in a fraction of a second with the push of a button, and maybe a few hours adjusting tone curve and highlights back at my computer.
But I like to think that he would view my work as a continuation of our family’s artistic tradition. Maybe in the future, my descendants will find the camera too complicated and instead compose prompts for AI to translate into images. To me, that’s simply another evolution of the art form.
Current AI “art” is created on the backs of hundreds of thousands of artists and photographers who made billions of images and spend time, love and dedication to have their work soullessly stolen and used by selfish people for profit without the slightest concept of ethics.
Alexander Nanitchkov (@Artofinca), Twitter, 2022-12-05 ↩
“I believe art is something inherently and intrinsically human, even corporate art made-for-hire is meticulously crafted by experts in their fields,” [Nicholas] Kole said. “When we sit down to draw, design, sculpt or paint, each mark is made with an intention. Each step of the process is an opportunity to ask new questions, tune the piece to the precise context it’s intended for, to add expressiveness and even a point of view. The result—movies, shows, games—are intended to connect that intricate craft with an audience who appreciates and enjoys it.”
AI does none of this, he explained, and he sees “a world filling up with meaningless, regurgitative cardboard cutouts that remind us of real art.”
Xiang, Chloe. Artists Are Revolting Against AI Art on ArtStation. Vice, 2022-12-14 ↩
The fear has sometimes been expressed that photography would in time entirely supersede the art of painting. Some people seem to think that when the process of taking photographs in colors has been perfected and made common enough, the painter will have nothing more to do. We need not fear anything of the kind. Perfection in photography may rid us in time of all the poor work done in color. The work of the artist, however, in which is seen his own individuality, his own perception of the beautiful, his own creation in fact, can no more perish than the soul which inspired it.
Henrietta Clopath. Genuine Art versus Mechanism, in Brush and Pencil Vol. 7, No. 6 (1901-03-01), pp. 331-333 ↩
Photography is an infinitely valuable mechanism by which to obtain records of limited abstract truth, and as such, may be of great service to the artist. Much may be learned about drawing by reference to a good photograph, that even a man of quick natural perception would be slow to learn without such help. But, unless the real shortcomings of the photograph are understood, it must certainly mislead if followed.
But beyond these merely technical matters, art differs from any mechanical process in being “the expression of man’s delight in God’s work”, and thus it appeals to, and awakens all noble sympathy and right feeling. All labor of love must have something beyond mere mechanism at the bottom of it.
Art and Photography, in The New Path Vol. 2, No. 12 (1865-12-01), pp. 198-199 ↩
My father somewhat rejected the title of “artist”, although in later life he branded himself as such. He prefered to think of himself as a craftsman, honing his skills through hardwork and study. ↩
As per usual, the Jupyter notebook used to perform this analysis can be found here (rendered on Github).
I selected crashes involving pedestrians from the SQLite database with the following query:
SELECT collision_date,
collision_time,
pedestrian_killed_count
FROM collisions
WHERE Collision_Date IS NOT NULL
AND pedestrian_Collision = 1 -- Involves a pedestrian
AND collision_date <= '2020-12-31' -- 2021 is incomplete
-- and it happens on Haloween
AND strftime('%m-%d', Collision_Date) = '10-31'
This gave me 1168 data points, of which 64 involve a pedestrian fatality, spanning the years 2001 through 2020. Incidents after 2020 are rejected because the database dump comes from mid-2021, and so that year is incomplete.
Alissa Walker wrote1 that it is not just drivers that make Halloween deadly, it is commuters. The best way to explore this point is to look at when in the day crashes happen:
As we saw in the data for all dates, weekdays have two major peaks in collisions during the morning and evening commutes, as well as a peak during school pickup times. Examining the data for Halloween specifically, we see that when it falls on a weekday the three expected peaks (morning and evening commutes, and school pick-up) are present, but there is also a fourth peak at 18:00, likely due to a combination of darkness making it difficult for drivers to see pedestrians and trick-or-treating bringing more people out walking. This data supports Walker’s observation that commuter traffic contributes significantly to the number of pedestrian incidents.
But Walker makes a very specific claim: that fatalities involving children increase on weekday Halloweens. Does the data support this claim? To find out, we need to look at the fatality rate instead of the total number of fatalities because the number of people driving and walking changes year-by-year and using the rate helps to normalize some of this variation. Below is a plot of the fatality rates for each year’s Halloween, separated into weekday and weekend:
The data above includes all pedestrian fatalities, not just those involving children. At first glance, the distributions for weekday and weekend Halloween fatalities appear similar. A Mann–Whitney U test confirms this, with a p-value of 0.93, indicating that the difference between the two is not statistically significant.
But what about children alone (defined as pedestrians under 18)? Here is that data:
One interesting observation is that no children have been killed by cars on weekend Halloweens, whereas about half of the weekdays have seen at least one child death. This suggests that there is something about weekday Halloweens that makes them particularly dangerous for children, consistent with Walker’s claim.
Despite this, the data does not show a significant difference between the two distributions, with a p-value of 0.08. However, this lower p-value as compared to the all-ages data does indicate some evidence for the specific claim about child deaths.
But when the commuting drivers are removed from the equation, deaths seem to go down. A study by AutoInsurance.org used FARS data to compare 24 years of crash data by days of the week. Halloweens that fell on workdays had an 83 percent increase in deadly crashes involving kids compared to weekend days. The worst day? Friday. Since 1994, the three deadliest Halloween nights for kids have all been Friday nights.
From: Walker, Alissa (October 25, 2019). The most terrifying part of Halloween for kids is our deadly streets. Curbed Vox Media. ↩
So with all the new data, I wanted to look at the most vulnerable road users: pedestrians.
As per usual, the Jupyter notebook used to perform this analysis can be found here (rendered on Github).
I selected crashes involving pedestrians from the SQLite database with the following query:
SELECT collision_date
, collision_time
, pedestrian_killed_count
FROM collisions
WHERE collision_date IS NOT NULL
AND pedestrian_Collision = 1 -- Involves a pedestrian
AND collision_date <= '2020-12-31' -- 2021 is incomplete
This gave me 282,039 data points to examine spanning the years 2001 through 2020. Incidents after 2020 are rejected because the database dump comes from mid-2021, and so that year is incomplete.
For bicycle involved incidents, I found there was an increase from about 2008 through 2013 followed by a decrease. For both bicycles as well as motorcycles, I found strong seasonality with many more crashes during the summer when people are out riding to take advantage of the weather. Pedestrian involved incidents defy both these trends:
Pedestrian involved incidents were flat or slightly down until about 2013, when instead of decreasing like bicycle collisions they increased strongly. Like both bicycles and motorcycle crashes, pedestrian incidents are strongly seasonal but they decrease in the summer (when there is a lot of light for drivers to see pedestrians) and increase in the winter when it gets dark early and drivers can’t see them. Of course there is also a massive decrease when COVID restrictions kept most people home starting in March 2020.
Cars are involved in crashes on days when drivers have to commute to work and on holidays where people travel. The worst day is Halloween when people work and then go out and have fun after. I was curious if the large increase on Halloween was due to a large increase in pedestrian collisions; the answer is no:
On Halloween, drivers are more likely to hit pedestrians than on any other day of the year. But its only about 15 to 20 more incidents than on any other October day, and there are almost 200 additional car crashes on Halloween. The number of additional pedestrian incidents does not account for much of the increase in car crashes. For a more detailed analysis of pedestrian collisions on Halloween, check out my other post.
Otherwise there are some interesting patterns. Many holidays trend the same direction as cars: New Years, Memorial Day, Veterans Day, Thanksgiving, and Christmas all see a large reduction in both car crashes and pedestrian incidents. Halloween, as covered above, sees a large increase in both.
One outlier is the 4th of July. Car crashes decrease because people do not have to commute, but pedestrian incidents increase. I think this is because people are walking around in the dark going to and coming back from watching fireworks, and drivers have trouble seeing them.
Finally we can look at when cars hit pedestrians by hour and whether it is a weekend or not:
The most striking feature is the large increase in the number of incidents during the morning commute (07–09) and again in the during the evening commute (17–19)! Commuters in cars are dangerous to pedestrians!
There is also an increase in incidents on both weekends and weekdays at about 17:00. This is probably because that is around sunset.1
The weekend curve rises smoothly through the day, but the weekday curve has a large increase at 14. I suspect this is from school pickup which is generally earlier than the commute.
Finally, it is interesting that the number of late night and early morning incidents is much higher on the weekend. This is likely due to people going out to bars, as the number drops off at 02:00 which is when the bars close in California.
This dataset covers the whole year so the exact time of sunset changes. It would be interesting to make a similar chart but relative to sunrise and sunset. ↩