The purpose of this post is to recreate the “Shift from 2016” arrow map that the New York Times used to show which counties became more Democratic or Republican-leaning from 2016 to 2020. This is a screenshot of the NYTimes figure:
I will use county-level Presidential election data from the MIT Election Data + Science Lab to recreate the chart. Since 2020 results are not final yet, I will focus on data from 2000-2016. I ran into multiple issues with the dataset, which I explain in the Code section below. The most signifcant issue was with the data from Alaska, which I excluded from the charts below because of problems with the data.
Recreating the NYTimes Figure
My approach is to use {ggplot2}
and {sf}
to map the data and draw arrows at angles to display shifts in the Democratic margin.
This is the dataframe I use to make the final map. It contains the year, state, county, FIPS code, county and state geometries, and election results per county.
glimpse(shift_map)
Rows: 12,610
Columns: 22
$ year <dbl> 2004, 2008, 2012, 2016, 2004, 2008, 2012…
$ state <chr> "ALABAMA", "ALABAMA", "ALABAMA", "ALABAM…
$ county_name <chr> "AUTAUGA", "AUTAUGA", "AUTAUGA", "AUTAUG…
$ county_fips <chr> "01001", "01001", "01001", "01001", "010…
$ candidatevotes_sum_democrat <dbl> 4758, 6093, 6363, 5936, 15599, 19386, 18…
$ candidatevotes_sum_republican <dbl> 15196, 17403, 17379, 18172, 52971, 61271…
$ pct_vote_democrat <dbl> 0.23845, 0.25932, 0.26801, 0.24623, 0.22…
$ pct_vote_republican <dbl> 0.7616, 0.7407, 0.7320, 0.7538, 0.7725, …
$ dem_margin_pct <dbl> -0.52310, -0.48136, -0.46399, -0.50755, …
$ dem_margin_votes <dbl> -10438, -11310, -11016, -12236, -37372, …
$ shift_pct <dbl> -0.106746, 0.041745, 0.017371, -0.043561…
$ shift_votes <dbl> -3387, -872, 294, -1220, -10497, -4513, …
$ shift_pct_scaled <dbl> 71.83, 91.08, 87.92, 80.02, 78.51, 89.01…
$ shift_votes_scaled <dbl> 16602, 11700, 10573, 12378, 30460, 18796…
$ shift_pct_binary <chr> "Republican", "Democratic", "Democratic"…
$ shift_votes_binned <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, F…
$ geometry <POLYGON [m]> POLYGON ((1269841 -1303980,..., …
$ center <list> <POINT (1253837 -1285138)>, <POINT (125…
$ lng0 <dbl> 1253837, 1253837, 1253837, 1253837, 1177…
$ lat0 <dbl> -1285138, -1285138, -1285138, -1285138, …
$ lng1 <dbl> 1259015, 1253616, 1254221, 1255982, 1183…
$ lat1 <dbl> -1269365, -1273441, -1274572, -1272948, …
<- shift_map %>%
shift_map_filtered filter(state != "ALASKA") %>%
filter(year == 2016) %>%
mutate(shift_pct_binary = case_when(sign(shift_pct) == 1 ~ "Democratic",
sign(shift_pct) == -1 ~ "Republican"),
shift_pct_binary = as.factor(shift_pct_binary)) %>%
mutate(shift_votes_binned = abs(shift_votes) <= 3000)
ggplot() +
geom_sf(data = filter(state_geo, !str_detect(NAME, "ALASKA")),
linewidth = .2,
fill = NA) +
geom_point(data = filter(shift_map_filtered, abs(shift_votes) <= 1500),
aes(x = lng0, y = lat0,
color = shift_pct_binary),
size = .75,
alpha = .3) +
geom_segment(data = filter(shift_map_filtered, abs(shift_votes) > 1500),
aes(x = lng0, xend = lng1,
y = lat0, yend = lat1,
color = shift_pct_binary,
linewidth = shift_votes,
alpha = shift_votes_binned),
linejoin = "mitre",
arrow = arrow(length = unit(0.08, "inches"))) +
scale_color_manual(values = c("#1375B7", "#C93135"), guide = guide_legend(title.position = "top")) +
scale_linewidth_continuous(range = c(.001, 2), guide = "none") +
scale_alpha_manual(values = c(1, .3), guide = "none") +
labs(color = "Shift in election margin") +
facet_wrap(~year) +
theme_void(base_size = 25) +
theme(legend.direction = "horizontal",
legend.position = "bottom")
The starting point of the line is the centroid of the county. The length and width of the lines are scaled to the shift in terms of number of votes. The NYTimes figure treats the shift as a binary variable when it rescales to degrees of the angle. In their graph, a Democratic shift is about 45 degrees (diagonal left) and a Republican shift is about 135 degrees (diagonal right). My figure maintains the continuous nature of the shift in %. I use the range 0-180 in degrees to indicate the shift. 0 degrees (all the way left) indicates a 100% shift towards Democrats, 90 degrees (pointing upwards) indicates no change, and 180 degrees (all the way to the right) indicates a 100% shift towards Republicans.
The end point of the line is calculated using the sine and cosine of the margin shift in % (re-scaled to be interpreted as degrees of an angle) multiplied by the margin shift in votes (re-scaled to be interpreted as meters), which is added to the origin point.
I lower the opacity of the lines in counties where the vote totals did not shift much. I use points instead of lines for counties where there was a very small shift in votes. This prevents overplotting in geographically dense areas with small populations.
This animation shows the shift in Presidential election margin from 2004-2016.
<- shift_map %>%
political_winds_anim filter(state != "ALASKA") %>%
mutate(id = str_c(state, county_name, county_fips)) %>%
mutate(year = as.integer(year)) %>%
mutate(shift_pct_binary = case_when(sign(shift_pct) == 1 ~ "Democratic",
sign(shift_pct) == -1 ~ "Republican"),
shift_pct_binary = as.factor(shift_pct_binary)) %>%
mutate(shift_votes_binned = abs(shift_votes) <= 3000) %>%
ggplot() +
geom_sf(data = filter(state_geo, NAME != "ALASKA"),
linewidth = .2,
fill = NA) +
geom_segment(aes(x = lng0, xend = lng1,
y = lat0, yend = lat1,
color = shift_pct_binary,
linewidth = shift_votes,
alpha = shift_votes_binned,
group = id),
linejoin = "mitre",
arrow = arrow(length = unit(0.09, "inches"))) +
scale_color_manual(values = c("#1375B7", "#C93135"), guide = guide_legend(title.position = "top")) +
scale_linewidth_continuous(range = c(.001, 1.3), guide = "none") +
scale_alpha_manual(values = c(1, .3), guide = "none") +
theme_void(base_size = 25) +
theme(legend.direction = "horizontal",
legend.position = "bottom") +
transition_states(year) +
labs(title = "Shift in Presidential election Democratic margin",
subtitle = "Year: {closest_state}",
color = "Shift in Democratic margin")
political_winds_anim
In the animation there is less overplotting, so I do not replace lines with dots for counties where there was a very small shift in votes.
Code
Ingest
#election shift
#script to clean data
#data from https://electionlab.mit.edu/data
#fips info
#https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code#FIPS_state_codes
#https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county
#changes https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.2010.html
#read in data
<- read_csv("post_data/countypres_2000-2020.csv",
data col_types = cols(
year = col_double(),
state = col_character(),
state_po = col_character(),
county_name = col_character(),
county_fips = col_character(),
office = col_character(),
candidate = col_character(),
party = col_character(),
candidatevotes = col_double(),
totalvotes = col_double(),
version = col_double()
%>%
)) clean_names() |>
filter(year <= 2016,
== "TOTAL") |>
mode select(-mode)
glimpse(data)
Rows: 50,524
Columns: 11
$ year <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2…
$ state <chr> "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", "ALABAMA", …
$ state_po <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "…
$ county_name <chr> "AUTAUGA", "AUTAUGA", "AUTAUGA", "AUTAUGA", "BALDWIN", …
$ county_fips <chr> "01001", "01001", "01001", "01001", "01003", "01003", "…
$ office <chr> "US PRESIDENT", "US PRESIDENT", "US PRESIDENT", "US PRE…
$ candidate <chr> "AL GORE", "GEORGE W. BUSH", "RALPH NADER", "OTHER", "A…
$ party <chr> "DEMOCRAT", "REPUBLICAN", "GREEN", "OTHER", "DEMOCRAT",…
$ candidatevotes <dbl> 4942, 11993, 160, 113, 13997, 40872, 1033, 578, 5188, 5…
$ totalvotes <dbl> 17208, 17208, 17208, 17208, 56480, 56480, 56480, 56480,…
$ version <dbl> 20220315, 20220315, 20220315, 20220315, 20220315, 20220…
Clean
This code filters out state-wide vote tabulations and then filters only on the two-party Presidential vote.
<- data %>%
data rename(fips_raw = county_fips) %>%
#filter out state-wide ballot collection
filter(!(state == "CONNECTICUT" & county_name == "STATEWIDE WRITEIN")) %>%
filter(!(state == "MAINE" & county_name == "MAINE UOCAVA")) %>%
filter(!(state == "RHODE ISLAND" & county_name == "FEDERAL PRECINCT"))
#filter for only 2-party vote in presidential elections
<- data %>%
data filter(office == "US PRESIDENT",
== "DEMOCRAT" | party == "REPUBLICAN") %>%
party arrange(state, county_name, fips_raw, year) %>%
replace_na(list(candidatevotes = 0))
Many of the FIPS codes from the source data dropped leading zeroes, which makes them unuseable for joining with Census data. This code adds the leading zeroes back.
These problems were fixed in a later update by MIT, so this code is not strictly necessary anymore
#clean fips data
<- str_to_title(c("ALABAMA", "ALASKA", "ARIZONA",
states_with_bad_fips "ARKANSAS", "CALIFORNIA",
"COLORADO", "CONNECTICUT"))
%>%
data filter(state %in% states_with_bad_fips) %>%
mutate(county_fips = paste0("0", fips_raw)) %>%
distinct(fips_raw, county_fips)
# A tibble: 0 × 2
# ℹ 2 variables: fips_raw <chr>, county_fips <chr>
<- data %>%
data #add "0" to front of states where leading "0" was dropped
mutate(county_fips = case_when(state %in% states_with_bad_fips ~ paste0("0", fips_raw),
!(state %in% states_with_bad_fips) ~ fips_raw))
I had to make a variety of decisions about how to clean up the data with regards to county geometries. The MIT data does not reflect cases where counties changed names or FIPS codes, or where counties merged. This code manually makes the changes necessary to join the data with Census geometry data. Note that I do not attempt to fix the data for Alaska, which was extremely different than the Census data. I was not confident that I could make accurate adjustments in this case, so I excluded Alaska entirely. These changes are not optimal, but I think it is close enough.
These problems were fixed in a later update by MIT, so this code is not strictly necessary anymore
#decisions to make with wonky geometry
#merge records for Shannnon and Oglala Lakota counties in SD
#merge Kansas City Missouri and Jackson County Missouri
#merge Bedford (city) fips 51515 with Bedford county 51019
<- data %>%
data #update Oglala Lakota SD fips
#changed in 2015 https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.2010.html
mutate(county_fips = case_when(state == "SOUTH DAKOTA" & county_name == "OGLALA LAKOTA" ~ "46102",
TRUE ~ county_fips)) %>%
#merge Kansas City Missouri with Jackson County Missouri
mutate(county_name = case_when(state == "MISSOURI" & county_name == "KANSAS CITY" ~ "JACKSON",
TRUE ~ county_name),
county_fips = case_when(state == "MISSOURI" & county_name == "JACKSON" ~ "29095",
TRUE ~ county_fips)) %>%
#merge Bedford (city) fips 51515 with Bedford county 51019
mutate(county_fips = case_when(state == "VIRGINIA" & county_name == "BEDFORD" & county_fips == "51515" ~ "51019",
TRUE ~ county_fips))
This compares the counties in the MIT data vs. what is in the Census API. Besides Alaska, this shows that my manual changes accounted for the issues I identified.
<- get_acs(variables = "B19013_001",
counties geography = "county",
geometry = FALSE) %>%
#mutate(census_geo_year = 2010) %>%
select(NAME, GEOID)
Getting data from the 2017-2021 5-year ACS
#alaska falls out: this is expected
#Broomfield County CO falls out for year 2000: was part of Boulder County in 2000
#Oglala Lakota County SD falls out for year 2000: was Shannon County in 2000
#
%>%
data select(year, state, county_name, county_fips) %>%
filter(state != "ALASKA") %>%
anti_join(counties, by = c("county_fips" = "GEOID")) %>%
count(state, county_name)
# A tibble: 1 × 3
state county_name n
<chr> <chr> <int>
1 SOUTH DAKOTA SHANNON 8
The process of merging some counties meant that I had to summarize the election results to the level of my new “adjusted” counties. This code performs that process.
#some counties have 4 records because of merging process
%>%
data select(state, county_name, county_fips, year) %>%
add_count(state, county_name, county_fips, year) %>%
distinct(n)
# A tibble: 2 × 1
n
<int>
1 2
2 4
#summarize candidatevotes to account for merged counties
%>%
data select(state, county_name, county_fips, year, office, party, candidate, candidatevotes) %>%
group_by(state, county_name, county_fips, year, office, party, candidate) %>%
summarize(candidatevotes_sum = sum(candidatevotes)) %>%
ungroup() %>%
add_count(state, county_name, county_fips, year) %>%
#confirm that each county only has 2 records
distinct(n)
`summarise()` has grouped output by 'state', 'county_name', 'county_fips',
'year', 'office', 'party'. You can override using the `.groups` argument.
# A tibble: 1 × 1
n
<int>
1 2
<- data %>%
data select(state, county_name, county_fips, year, office, party, candidate, candidatevotes) %>%
group_by(state, county_name, county_fips, year, office, party, candidate) %>%
summarize(candidatevotes_sum = sum(candidatevotes)) %>%
ungroup()
`summarise()` has grouped output by 'state', 'county_name', 'county_fips',
'year', 'office', 'party'. You can override using the `.groups` argument.
Munge
This part performs the more straightfoward tasks of calculating a candidate’s % of the vote and the election-to-election shift in %.
<- data %>%
presidential_votes group_by(year, state, county_name, county_fips) %>%
mutate(pct_vote = candidatevotes_sum / sum(candidatevotes_sum)) %>%
ungroup() %>%
select(year, state, county_name, county_fips, party, candidatevotes_sum, pct_vote)
<- presidential_votes %>%
presidential_votes_shift mutate(party = str_to_lower(party)) %>%
pivot_wider(names_from = party, values_from = c(candidatevotes_sum, pct_vote)) %>%
mutate(dem_margin_pct = pct_vote_democrat - pct_vote_republican,
dem_margin_votes = candidatevotes_sum_democrat - candidatevotes_sum_republican) %>%
arrange(state, county_name, county_fips, year) %>%
group_by(state, county_name, county_fips) %>%
mutate(shift_pct = dem_margin_pct - lag(dem_margin_pct),
shift_votes = dem_margin_votes - lag(dem_margin_votes)) %>%
filter(row_number() > 1) %>%
ungroup()
Finally, this creates new variables that rescale the shift in % and votes to degrees and meters, respectively. I also create variations of shift_pct
and shift_votes
to use in the graph.
<- presidential_votes_shift %>%
presidential_votes_shift mutate(shift_pct_scaled = rescale(shift_pct, to = c(0, 180)), #republican 0, democrat 180
shift_votes_scaled = rescale(abs(shift_votes), to = c(10^4, 10^6))) %>%
mutate(shift_pct_binary = case_when(sign(shift_pct) == 1 ~ "Democratic",
sign(shift_pct) == -1 ~ "Republican"),
shift_pct_binary = as.factor(shift_pct_binary)) %>%
mutate(shift_votes_binned = abs(shift_votes) <= 3000)
#create shift map object
<- presidential_votes_shift %>%
shift_map left_join(county_geo, by = c("county_fips" = "GEOID")) %>%
st_sf() %>%
rename(lng0 = center_lon_x,
lat0 = center_lat_y) %>%
mutate(lng1 = lng0 + (shift_votes_scaled * cos(NISTdegTOradian(shift_pct_scaled))),
lat1 = lat0 + (shift_votes_scaled * sin(NISTdegTOradian(shift_pct_scaled))))
<- shift_map %>%
shift_map_filtered filter(state != "ALASKA") %>%
filter(year == 2016) %>%
mutate(shift_pct_binary = case_when(sign(shift_pct) == 1 ~ "Democratic",
sign(shift_pct) == -1 ~ "Republican"),
shift_pct_binary = as.factor(shift_pct_binary))
ggplot() +
geom_sf(data = filter(state_geo, !str_detect(NAME, "ALASKA")),
linewidth = .2,
fill = NA) +
geom_point(data = filter(shift_map_filtered, abs(shift_votes) <= 1500),
aes(x = lng0, y = lat0,
color = shift_pct_binary),
size = .75,
alpha = .3) +
geom_segment(data = filter(shift_map_filtered, abs(shift_votes) > 1500),
aes(x = lng0, xend = lng1,
y = lat0, yend = lat1,
color = shift_pct_binary,
linewidth = shift_votes,
alpha = shift_votes_binned),
linejoin = "mitre",
arrow = arrow(length = unit(0.08, "inches"))) +
scale_color_manual(values = c("#1375B7", "#C93135"), guide = guide_legend(title.position = "top")) +
scale_linewidth_continuous(range = c(.001, 2), guide = "none") +
scale_alpha_manual(values = c(1, .3), guide = "none") +
labs(color = "Shift in election margin") +
facet_wrap(~year) +
theme_void(base_size = 25) +
theme(legend.direction = "horizontal",
legend.position = "bottom")
<- shift_map %>%
political_winds_anim filter(state != "Alaska") %>%
mutate(id = str_c(state, county_name, county_fips)) %>%
mutate(year = as.integer(year)) %>%
mutate(shift_votes_binned = abs(shift_votes) <= 3000) %>%
ggplot() +
geom_sf(data = filter(state_geo, NAME != "Alaska"),
linewidth = .2,
fill = NA) +
geom_segment(aes(x = lng0, xend = lng1,
y = lat0, yend = lat1,
color = shift_pct_binary,
linewidth = shift_votes,
alpha = shift_votes_binned,
group = id),
linejoin = "mitre",
arrow = arrow(length = unit(0.09, "inches"))) +
scale_color_manual(values = c("#1375B7", "#C93135"), guide = guide_legend(title.position = "top")) +
scale_size_continuous(range = c(.001, 1.3), guide = "none") +
scale_alpha_manual(values = c(1, .3), guide = "none") +
theme_void(base_size = 25) +
theme(legend.direction = "horizontal",
legend.position = "bottom") +
transition_states(year) +
labs(title = "Shift in Presidential election Democratic margin",
subtitle = "Year: {closest_state}",
color = "Shift in Democratic margin")
political_winds_anim