{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Double-checking FiveThirtyEight's 2016 Primary Predictions\n",
"\n",
"Here I look at the [predictions that FiveThiryEight made](https://projects.fivethirtyeight.com/election-2016/primary-forecast/) about the 2016 Presidential Primaries.\n",
"\n",
"## Loading the data\n",
"\n",
"Load the data about their predictions and the actual outcomes into `pandas` dataframes:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Load the dataframes from disk\n",
"import pandas as pd\n",
"dem = pd.read_csv(\"./2016_dem_primary_dataframe.csv\", index_col=[0,1])\n",
"gop = pd.read_csv(\"./2016_gop_primary_dataframe.csv\", index_col=[0,1])"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# We only care about races where there was a prediction made\n",
"dem = dem.dropna()\n",
"gop = gop.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Looking at the data\n",
"\n",
"Let's look at the results for Iowa for the Democrats, just to see what is in the table."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
80% Lower Bound
\n",
"
80% Upper Bound
\n",
"
Result
\n",
"
\n",
"
\n",
"
State
\n",
"
Candidate
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Iowa
\n",
"
Clinton
\n",
"
44.0
\n",
"
54.0
\n",
"
49.9
\n",
"
\n",
"
\n",
"
O'Malley
\n",
"
4.0
\n",
"
8.0
\n",
"
0.6
\n",
"
\n",
"
\n",
"
Sanders
\n",
"
40.0
\n",
"
52.0
\n",
"
49.6
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 80% Lower Bound 80% Upper Bound Result\n",
"State Candidate \n",
"Iowa Clinton 44.0 54.0 49.9\n",
" O'Malley 4.0 8.0 0.6\n",
" Sanders 40.0 52.0 49.6"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dem.ix[[\"Iowa\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 80% confidence intervales are given by the \"`80% Lower Bound`\" and \"`80% Upper Bound`\" columns. The actually result of the election is given in the \"`Result`\" column.\n",
"\n",
"Here is the data for Iowa for the Republicans:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
80% Lower Bound
\n",
"
80% Upper Bound
\n",
"
Result
\n",
"
\n",
"
\n",
"
State
\n",
"
Candidate
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Iowa
\n",
"
Carson
\n",
"
3.0
\n",
"
15.0
\n",
"
9.3
\n",
"
\n",
"
\n",
"
Cruz
\n",
"
14.0
\n",
"
36.0
\n",
"
27.6
\n",
"
\n",
"
\n",
"
Kasich
\n",
"
1.0
\n",
"
4.0
\n",
"
1.9
\n",
"
\n",
"
\n",
"
Rubio
\n",
"
9.0
\n",
"
27.0
\n",
"
23.1
\n",
"
\n",
"
\n",
"
Trump
\n",
"
15.0
\n",
"
38.0
\n",
"
24.3
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 80% Lower Bound 80% Upper Bound Result\n",
"State Candidate \n",
"Iowa Carson 3.0 15.0 9.3\n",
" Cruz 14.0 36.0 27.6\n",
" Kasich 1.0 4.0 1.9\n",
" Rubio 9.0 27.0 23.1\n",
" Trump 15.0 38.0 24.3"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gop.ix[[\"Iowa\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking the intervals\n",
"\n",
"Now I'll add a set of columns that tells us if the prediction was good or not, that is, if the actual result was within the 80% confidence interval or not:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def add_good_predicitons(df, name=\"Prediction Good\"):\n",
" cond = (df['80% Lower Bound'] <= df['Result']) & (df[\"Result\"] <= df[\"80% Upper Bound\"])\n",
" df[\"Prediction Good\"] = np.where(cond, True, False)\n",
" df[\"Prediction Low\"] = np.where(df[\"80% Upper Bound\"] < df[\"Result\"], True, False)\n",
" df[\"Prediction High\"] = np.where(df[\"Result\"] < df[\"80% Lower Bound\"], True, False)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"add_good_predicitons(dem)\n",
"add_good_predicitons(gop)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
80% Lower Bound
\n",
"
80% Upper Bound
\n",
"
Result
\n",
"
Prediction Good
\n",
"
Prediction Low
\n",
"
Prediction High
\n",
"
\n",
"
\n",
"
State
\n",
"
Candidate
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Iowa
\n",
"
Clinton
\n",
"
44.0
\n",
"
54.0
\n",
"
49.9
\n",
"
True
\n",
"
False
\n",
"
False
\n",
"
\n",
"
\n",
"
O'Malley
\n",
"
4.0
\n",
"
8.0
\n",
"
0.6
\n",
"
False
\n",
"
False
\n",
"
True
\n",
"
\n",
"
\n",
"
Sanders
\n",
"
40.0
\n",
"
52.0
\n",
"
49.6
\n",
"
True
\n",
"
False
\n",
"
False
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 80% Lower Bound 80% Upper Bound Result Prediction Good \\\n",
"State Candidate \n",
"Iowa Clinton 44.0 54.0 49.9 True \n",
" O'Malley 4.0 8.0 0.6 False \n",
" Sanders 40.0 52.0 49.6 True \n",
"\n",
" Prediction Low Prediction High \n",
"State Candidate \n",
"Iowa Clinton False False \n",
" O'Malley False True \n",
" Sanders False False "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dem.ix[[\"Iowa\"]]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Compute how the results match the predictions\n",
"def get_low_right_high(df):\n",
" r = df[\"Prediction Good\"]\n",
" right = float(r.sum()) / r.count()\n",
" h = df[\"Prediction High\"]\n",
" high = float(h.sum()) / h.count()\n",
" l = df[\"Prediction Low\"]\n",
" low = float(l.sum()) / l.count()\n",
" \n",
" return low, right, high"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dem_low, dem_right, dem_high = get_low_right_high(dem)\n",
"gop_low, gop_right, gop_high = get_low_right_high(gop)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Party Under Right Over Total\n",
"------------|-------------------------------\n",
"Democrats | 3.64%, 80.00%, 16.36%, 100.00%\n",
"Republicans | 7.14%, 77.68%, 15.18%, 100.00%\n"
]
}
],
"source": [
"print \"Party Under Right Over Total\"\n",
"print \"------------|-------------------------------\"\n",
"print \"Democrats | {:.2%}, {:.2%}, {:.2%}, {:.2%}\".format(dem_low, dem_right, dem_high, sum((dem_low, dem_right, dem_high)))\n",
"print \"Republicans | {:.2%}, {:.2%}, {:.2%}, {:.2%}\".format(gop_low, gop_right, gop_high, sum((gop_low, gop_right, gop_high)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting the Results\n",
"\n",
"We can make a plot of the actual voting results by scaling the vote share so that the low edge of the confidence interval is +1, the high edge is -1, and the midpoint is 0. Then if a candidate's vote share is within the predicted range their result will be between -1 and 1. If the prediction was 45% to 55% and the candidate actually got 60%, that would show up at -2 on the plot. The minus sign indicates that the prediction was too low."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def add_scaled_result(df):\n",
" interval = (df[\"80% Upper Bound\"] - df[\"80% Lower Bound\"]) / 2.\n",
" means = df[\"80% Lower Bound\"] + interval\n",
" # The - out front makes it so that if the prediction is 1 sigma high, we get a +1\n",
" df[\"Scaled Result\"] = -(df[\"Result\"] - means) / interval"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"add_scaled_result(dem)\n",
"add_scaled_result(gop)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'svg'\n",
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"width = 10\n",
"height = 6\n",
"\n",
"plt.figure(figsize=(width, height))\n",
"\n",
"bins = [float(i)/10. for i in range(-44, 46, 2)]\n",
"ylim = [0, 17]\n",
"xlim = [min(bins), -min(bins)]\n",
"\n",
"\n",
"ax1 = plt.subplot2grid((1,2),(0,0))\n",
"ax2 = plt.subplot2grid((1,2),(0,1))\n",
"\n",
"dem[\"Scaled Result\"].plot(kind=\"hist\", color='b', ax=ax1, ylim=ylim, xlim=xlim, bins=bins)\n",
"gop[\"Scaled Result\"].plot(kind=\"hist\", color='r', ax=ax2, ylim=ylim, xlim=xlim, bins=bins)\n",
"\n",
"# 80% Confidence intervals\n",
"color=\"black\"\n",
"linestyle=\"dotted\"\n",
"ax1.axvline(-1, color=color, linestyle=linestyle)\n",
"ax1.axvline(+1, color=color, linestyle=linestyle)\n",
"ax2.axvline(-1, color=color, linestyle=linestyle)\n",
"ax2.axvline(+1, color=color, linestyle=linestyle)\n",
"\n",
"ax1.set_title(\"Democrats\")\n",
"ax2.set_title(\"Republicans\")\n",
"ax2.yaxis.set_visible(False)\n",
"\n",
"plt.subplots_adjust(wspace=0)\n",
"\n",
"plt.savefig(\"/tmp/538_scaled_results.png\", bbox_inches='tight')\n",
"plt.savefig(\"/tmp/538_scaled_results.svg\", bbox_inches='tight')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## A (Rough) Estimate of Uncertainties\n",
"\n",
"When I read FiveThirtyEight's plots, I only ever pick a whole number (I certainly am not accurate enough to get better precision than that). I estimate that if I say a number is \"34%\", then it is just as likely to be 33 or 35. To estimate what effect this has, I randomly adjust the prediction bounds (up 1, down 1, or leaving it alone with equal probability) and see how the predictions fare. The number I report below is the mean of these trials, and the uncertainties represent two standard deviations."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import random\n",
"import numpy as np\n",
"\n",
"def ugly_simulation_hack(df, sigma=1, iterations=10000, std_to_return=2):\n",
"\n",
" sim_good = []\n",
" sim_low = []\n",
" sim_high = []\n",
"\n",
" # Run many simulations\n",
" for _ in xrange(iterations):\n",
"\n",
" pred_good = []\n",
" pred_low = []\n",
" pred_high = []\n",
"\n",
" # Check every prediction and perturb them\n",
" for ((state, candidate), low, high, result, _, _, _, _) in df.itertuples():\n",
"\n",
" # Perturb the bounds I read from 538's plots assuming a Gaussian\n",
" # distribution around the value\n",
" #new_low = random.gauss(low, sigma)\n",
" #new_high = random.gauss(high, sigma)\n",
" new_low = random.randint(low-sigma, low+sigma)\n",
" new_high = random.randint(high-sigma, high+sigma)\n",
"\n",
" # Check if the perturbed prediction is good or not\n",
" pred_good.append(new_low <= result <= new_high)\n",
" pred_low.append(new_high < result)\n",
" pred_high.append(result < new_low)\n",
"\n",
" # Calculate the number correct accounting for the perturbations\n",
" sim_good.append(sum(pred_good)/float(len(pred_good)))\n",
" sim_low.append(sum(pred_low)/float(len(pred_low)))\n",
" sim_high.append(sum(pred_high)/float(len(pred_high)))\n",
"\n",
" # Calculate outcome of the simulation\n",
" good_mean = np.mean(sim_good)\n",
" good_std = np.std(sim_good)\n",
" low_mean = np.mean(sim_low)\n",
" low_std = np.std(sim_low)\n",
" high_mean = np.mean(sim_high)\n",
" high_std = np.std(sim_high)\n",
"\n",
" return (\n",
" good_mean,\n",
" good_std * std_to_return,\n",
" low_mean,\n",
" low_std * std_to_return,\n",
" high_mean,\n",
" high_std * std_to_return,\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"gop_good, gop_good_std, gop_low, gop_low_std, gop_high, gop_high_std = ugly_simulation_hack(gop)\n",
"dem_good, dem_good_std, dem_low, dem_low_std, dem_high, dem_high_std = ugly_simulation_hack(dem)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Party Under Right Over\n",
"------------|---------------------------------------------\n",
"Democrats | 5.5% +- 3.0% 77.6% +- 4.2% 17.0% +- 3.0%\n",
"Republicans | 6.5% +- 1.2% 78.6% +- 3.3% 14.9% +- 3.0%\n"
]
}
],
"source": [
"print \"Party Under Right Over\"\n",
"print \"------------|---------------------------------------------\"\n",
"print \"Democrats | {:.1%} +- {:.1%} {:.1%} +- {:.1%} {:.1%} +- {:.1%}\".format(dem_low, dem_low_std, dem_good, dem_good_std, dem_high, dem_high_std)\n",
"print \"Republicans | {:.1%} +- {:.1%} {:.1%} +- {:.1%} {:.1%} +- {:.1%}\".format(gop_low, gop_low_std, gop_good, gop_good_std, gop_high, gop_high_std)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}