library(tidyverse)
library(sf)
library(tigris)
library(tidycensus)
library(tidygraph)
library(ggraph)
Note: high-res images of the main graphs from this post are available here, here, here, and here.
In this post I will use networks plots to analyze patterns of commuters in Allegheny County. I use the LEHD Origin-Destination Employment Statistics (LODES) dataset provided by the U.S. Census Bureau. This dataset contains summarized and anonymized estimate data regarding where people live and work. I use this data to map where Allegheny County residents live and work.
In my analysis I use many of the standard tidyverse packages, sf, tidygraph, and ggraph.
The first step is to read in the geographies crosswalk and the census tract geographies.
<- read_csv("post_data/pa_xwalk.csv.gz", col_types = cols(.default = "c"))
geo_crosswalk
<- get_decennial(geography = "tract",
allegheny_tracts variables = c(total_pop = "P001001"),
state = "PA",
county = "Allegheny County",
geometry = TRUE,
output = "wide",
year = 2010)
<- function(x, y) {
st_erase st_difference(x, st_union(st_combine(y)))
}
<- area_water("PA", "Allegheny", class = "sf") rivers
This shows the outlines of the tracts used in the analysis.
%>%
allegheny_tracts ggplot() +
geom_sf()
The crosswalk file contains keys to join a variety of geographies.
geo_crosswalk
# A tibble: 421,545 × 43
tabblk2010 st stusps stname cty ctyname trct trctname bgrp bgrpname
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 420912030002… 42 PA Penns… 42091 Montgo… 4209… 2030 (M… 4209… 2 (Trac…
2 420912070013… 42 PA Penns… 42091 Montgo… 4209… 2070.01… 4209… 3 (Trac…
3 420912070013… 42 PA Penns… 42091 Montgo… 4209… 2070.01… 4209… 3 (Trac…
4 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
5 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
6 420710134002… 42 PA Penns… 42071 Lancas… 4207… 134 (La… 4207… 2 (Trac…
7 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
8 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
9 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
10 420710144022… 42 PA Penns… 42071 Lancas… 4207… 144.02 … 4207… 2 (Trac…
# ℹ 421,535 more rows
# ℹ 33 more variables: cbsa <chr>, cbsaname <chr>, zcta <chr>, zctaname <chr>,
# stplc <chr>, stplcname <chr>, ctycsub <chr>, ctycsubname <chr>,
# stcd116 <chr>, stcd116name <chr>, stsldl <chr>, stsldlname <chr>,
# stsldu <chr>, stslduname <chr>, stschool <chr>, stschoolname <chr>,
# stsecon <chr>, stseconname <chr>, trib <chr>, tribname <chr>, tsub <chr>,
# tsubname <chr>, stanrc <chr>, stanrcname <chr>, necta <chr>, …
Next I read in the main LODES data. This is a big file, so it takes a moment.
<- read_csv("post_data/pa_od_main_JT00_2015.csv.gz", col_types = cols(.default = "c")) %>%
df mutate(S000 = as.numeric(S000)) %>%
select(h_geocode, w_geocode, S000)
df
# A tibble: 4,961,712 × 3
h_geocode w_geocode S000
<chr> <chr> <dbl>
1 420010303001089 420010301011013 1
2 420410126001030 420010301011013 1
3 421330217111050 420010301011013 1
4 420010301011029 420010301011016 1
5 420010301011063 420010301011016 1
6 420010301011090 420010301011016 1
7 420010302001079 420010301011016 1
8 420010302001089 420010301011016 1
9 420010304003043 420010301011016 1
10 420010307002014 420010301011016 1
# ℹ 4,961,702 more rows
Next I summarize the number of commuters per home-work tract combination. The original file uses census block codes, which are too granular for this analysis. I link the blocks to census tracts and aggregate to that level.
<- df %>%
df_tracts_summarized group_by(h_geocode, w_geocode) %>%
summarize(commuters = sum(S000)) %>%
ungroup() %>%
arrange(desc(commuters))
<- df_tracts_summarized %>%
df_tracts_summarized left_join(geo_crosswalk %>% select(tabblk2010, trct), by = c("h_geocode" = "tabblk2010")) %>%
rename(h_tract = trct) %>%
left_join(geo_crosswalk %>% select(tabblk2010, trct), by = c("w_geocode" = "tabblk2010")) %>%
rename(w_tract = trct)
<- df_tracts_summarized %>%
df_tracts_summarized group_by(h_tract, w_tract) %>%
summarize(commuters = sum(commuters)) %>%
ungroup() %>%
arrange(desc(commuters))
<- df_tracts_summarized %>%
df_tracts_summarized semi_join(allegheny_tracts, by = c("h_tract" = "GEOID")) %>%
semi_join(allegheny_tracts, by = c("w_tract" = "GEOID"))
This code removes rows where the commuter lives and works in the same tract.
<- df_tracts_summarized %>%
df_tracts_summarized arrange(h_tract) %>%
na.omit() %>%
filter(!(h_tract == w_tract))
This code finds the center of each tract, which I use as the nodes in the network plots.
<- allegheny_tracts %>%
allegheny_tracts arrange(GEOID)
<- cbind(allegheny_tracts,
allegheny_tracts_centroids st_coordinates(st_centroid(allegheny_tracts))) %>%
st_set_geometry(NULL) %>%
as_tibble() %>%
rename(x = X,
y = Y) %>%
select(GEOID, x, y)
This shows that the centroids correctly appear in the center of each tract.
%>%
allegheny_tracts ggplot() +
geom_sf() +
geom_point(data = allegheny_tracts_centroids, aes(x, y), size = .2)
EDA
First, I do some exploratory analysis of the number of commuters in/out per tract. Since we cannot join the census geography dataframe to the LODES data directly, I reverse engineer the process by splitting the data into separate “home” and “work” dataframes, and then joining the census geography dataframe to those dataframes.
<- df_tracts_summarized %>%
df_home rename(tract = h_tract,
commuters_out = commuters) %>%
select(-w_tract) %>%
group_by(tract) %>%
summarize(commuters_out = sum(commuters_out))
<- df_tracts_summarized %>%
df_work rename(tract = w_tract,
commuters_in = commuters) %>%
select(-h_tract) %>%
group_by(tract) %>%
summarize(commuters_in = sum(commuters_in))
This checks that there are no duplicate tracts in either dataframe:
%>%
df_home count(tract, sort = TRUE) %>%
filter(n > 1)
# A tibble: 0 × 2
# ℹ 2 variables: tract <chr>, n <int>
%>%
df_home count(tract, sort = TRUE) %>%
filter(n > 1)
# A tibble: 0 × 2
# ℹ 2 variables: tract <chr>, n <int>
This joins the separate dataframes back to the census geography dataframe:
<- allegheny_tracts %>%
allegheny_tracts left_join(df_home, by = c("GEOID" = "tract")) %>%
left_join(df_work, by = c("GEOID" = "tract")) %>%
replace_na(list(commuters_in = 0))
The tract with many more commuters “in” is downtown Pittsburgh.
%>%
allegheny_tracts ggplot(aes(commuters_out, commuters_in, label = NAME)) +
geom_point() +
theme_bw()
%>%
allegheny_tracts ggplot() +
geom_sf(aes(fill = commuters_in), color = NA) +
scale_fill_viridis_c() +
theme_graph()
This shows which tracts have the most outflow of commuters:
%>%
allegheny_tracts ggplot() +
geom_sf(aes(fill = commuters_out), color = NA) +
scale_fill_viridis_c() +
theme_graph()
The outflow of commuters tracks with the population of the tract, with some exceptions.
%>%
allegheny_tracts ggplot(aes(commuters_out, total_pop)) +
geom_point() +
theme_bw()
Downtown Pittsburgh dominates in terms of inflow vs. outflow differential.
%>%
allegheny_tracts mutate(diff = commuters_in - commuters_out) %>%
ggplot() +
geom_sf(aes(fill = diff), color = NA) +
scale_fill_viridis_c("Commuters in minus commuters out", direction = 1) +
theme_graph()
Set up main graph
With the exploratory analysis done, I move on to the main task of making a geographic network plot where the nodes are the center of each census tract.
First, I set a minimum threshhold for the number of commuters flowing between two tracts. Then, I transform the df_tracts_summarized
data into a tbl_graph
object, which lets it be used for network analysis.
<- 100
minimum_commuters
<- df_tracts_summarized %>%
g as_tbl_graph(directed = TRUE)
g
# A tbl_graph: 402 nodes and 74239 edges
#
# A directed simple graph with 1 component
#
# Node Data: 402 × 1 (active)
name
<chr>
1 42003010300
2 42003020100
3 42003020300
4 42003030500
5 42003040200
6 42003040400
7 42003040500
8 42003040600
9 42003040900
10 42003050100
# ℹ 392 more rows
#
# Edge Data: 74,239 × 3
from to commuters
<int> <int> <dbl>
1 1 2 73
2 1 5 22
3 1 64 15
# ℹ 74,236 more rows
The result is a list of two dataframes; one for the nodes, one for the edges.
This filters out the edges that do not meet the minimum threshhold.
<- g %>%
g_main activate(edges) %>%
filter(commuters > minimum_commuters)
g_main
# A tbl_graph: 402 nodes and 360 edges
#
# A directed simple graph with 136 components
#
# Edge Data: 360 × 3 (active)
from to commuters
<int> <int> <dbl>
1 4 2 143
2 10 2 103
3 11 2 142
4 15 2 178
5 16 2 101
6 17 2 115
7 18 2 284
8 18 402 158
9 19 2 136
10 19 402 117
# ℹ 350 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
This shows the edges with the most commuters:
%>%
g_main activate(edges) %>%
arrange(desc(commuters))
# A tbl_graph: 402 nodes and 360 edges
#
# A directed simple graph with 136 components
#
# Edge Data: 360 × 3 (active)
from to commuters
<int> <int> <dbl>
1 122 2 552
2 75 2 484
3 125 2 440
4 65 2 407
5 79 2 393
6 127 2 389
7 253 2 385
8 249 2 366
9 62 2 361
10 228 2 361
# ℹ 350 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
This code sets the nodes for the graph at the center of each census tract and creates a manual layout that the network plot will use.
<- allegheny_tracts_centroids
node_pos
<- create_layout(graph = g_main,
manual_layout layout = node_pos)
# A tibble: 402 × 7
GEOID x y name .ggraph.orig_index .ggraph.index circular
<chr> <dbl> <dbl> <chr> <int> <int> <lgl>
1 42003010300 -80.0 40.4 42003010300 1 1 FALSE
2 42003020100 -80.0 40.4 42003020100 2 2 FALSE
3 42003020300 -80.0 40.5 42003020300 3 3 FALSE
4 42003030500 -80.0 40.4 42003030500 4 4 FALSE
5 42003040200 -80.0 40.4 42003040200 5 5 FALSE
6 42003040400 -79.9 40.4 42003040400 6 6 FALSE
7 42003040500 -80.0 40.4 42003040500 7 7 FALSE
8 42003040600 -80.0 40.4 42003040600 8 8 FALSE
9 42003040900 -80.0 40.4 42003040900 9 9 FALSE
10 42003050100 -80.0 40.4 42003050100 10 10 FALSE
# ℹ 392 more rows
Finally, this sets the parameters for the plot and makes the network graph:
<- str_c("Minimum: ", minimum_commuters, " commuters")
legend_title
ggraph(manual_layout) +
geom_sf(data = allegheny_tracts, color = "dark grey", fill = NA) +
geom_sf(data = rivers, fill = "white", color = NA, alpha = .5) +
geom_node_point(alpha = 0) +
geom_edge_fan(aes(edge_width = commuters,
edge_alpha = commuters),
arrow = arrow(length = unit(.5, 'lines')),
start_cap = circle(.1, 'lines'),
end_cap = circle(.2, 'lines'),
color = "white",
strength = .5) +
scale_edge_width_continuous(legend_title, range = c(.1, 1.5)) +
scale_edge_alpha_continuous(legend_title, range = c(.1, .8)) +
scale_fill_viridis_c() +
labs(x = NULL,
y = NULL,
title = "Where do people commute from/to for work?",
subtitle = "Excludes within-tract commuters",
caption = "Based on 2015 US Census LODES dataset | @conor_tompkins") +
theme_graph() +
theme(legend.background = element_rect(fill = "light grey"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
panel.background = element_rect(fill = "black"))
Clearly downtown Pittsburgh attracts the most commuters. This is not surprising, but it obscures other job centers that attract employees.
Set up graph without downtown tract
This code filters out commuters that work downtown. Again, set a threshhold and filter on it.
<- 100
minimum_commuters
<- g %>%
g_filtered activate(edges) %>%
filter(commuters > minimum_commuters)
g_filtered
# A tbl_graph: 402 nodes and 360 edges
#
# A directed simple graph with 136 components
#
# Edge Data: 360 × 3 (active)
from to commuters
<int> <int> <dbl>
1 4 2 143
2 10 2 103
3 11 2 142
4 15 2 178
5 16 2 101
6 17 2 115
7 18 2 284
8 18 402 158
9 19 2 136
10 19 402 117
# ℹ 350 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
This code goes into the network data and excludes edges that connect to the downtown tract.
<- "42003020100"
filter_tract
<- manual_layout %>%
selected_node filter(name != filter_tract) %>%
pull(.ggraph.orig_index)
<- g_filtered %>%
g_filtered activate(edges) %>%
filter(to %in% selected_node)
g_filtered
# A tbl_graph: 402 nodes and 103 edges
#
# A directed simple graph with 333 components
#
# Edge Data: 103 × 3 (active)
from to commuters
<int> <int> <dbl>
1 18 402 158
2 19 402 117
3 20 402 101
4 21 402 287
5 21 5 132
6 24 402 107
7 32 402 131
8 36 402 222
9 37 402 159
10 50 402 158
# ℹ 93 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
This creates another layout for this plot:
<- create_layout(graph = g_filtered,
manual_layout_filtered layout = node_pos)
<- str_c("Minimum: ", minimum_commuters, " commuters")
legend_title
ggraph(manual_layout_filtered) +
geom_sf(data = allegheny_tracts, size = .1, fill = NA, color = "dark grey") +
geom_sf(data = rivers, fill = "white", color = NA, alpha = .5) +
geom_edge_fan(aes(edge_width = commuters, edge_alpha = commuters
),arrow = arrow(length = unit(.5, 'lines')),
start_cap = circle(.1, 'lines'),
end_cap = circle(.2, 'lines'),
color = "white",
strength = .5) +
scale_edge_width_continuous(legend_title, range = c(.1, 1.5)) +
scale_edge_alpha_continuous(legend_title, range = c(.1, .8)) +
scale_fill_viridis_c() +
labs(x = NULL,
y = NULL,
title = "Where do people commute from/to for work?",
subtitle = "Excludes within-tract commuters and commuters to downtown tract",
caption = "Based on 2015 US Census LODES dataset | @conor_tompkins") +
theme_graph() +
theme(legend.background = element_rect(fill = "light grey"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
panel.background = element_rect(fill = "black"))
Set up facted graphs
This code creates a faceted plot that focuses on the top 5 tracts in terms of commuter outflow:
<- 5
minimum_commuters
<- df_home %>%
top_work_tracts arrange(desc(commuters_out)) %>%
top_n(5, commuters_out) %>%
select(tract)
<- g %>%
g_facet activate(edges) %>%
left_join(manual_layout %>% select(.ggraph.index, name), by = c("from" = ".ggraph.index")) %>%
semi_join(top_work_tracts, by = c("name" = "tract")) %>%
filter(commuters > minimum_commuters)
%>%
g_facet arrange(desc(commuters)) %>%
activate(edges)
# A tbl_graph: 402 nodes and 477 edges
#
# A directed simple graph with 237 components
#
# Edge Data: 477 × 4 (active)
from to commuters name
<int> <int> <dbl> <chr>
1 122 2 552 42003409000
2 125 2 440 42003412001
3 127 2 389 42003413100
4 192 202 311 42003453004
5 150 2 300 42003426300
6 192 2 294 42003453004
7 125 402 241 42003412001
8 192 190 209 42003453004
9 192 189 180 42003453004
10 122 124 170 42003409000
# ℹ 467 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
<- create_layout(graph = g_facet,
manual_layout_faceted layout = node_pos)
%>%
manual_layout_faceted as_tibble()
# A tibble: 402 × 7
GEOID x y name .ggraph.orig_index .ggraph.index circular
<chr> <dbl> <dbl> <chr> <int> <int> <lgl>
1 42003010300 -80.0 40.4 42003010300 1 1 FALSE
2 42003020100 -80.0 40.4 42003020100 2 2 FALSE
3 42003020300 -80.0 40.5 42003020300 3 3 FALSE
4 42003030500 -80.0 40.4 42003030500 4 4 FALSE
5 42003040200 -80.0 40.4 42003040200 5 5 FALSE
6 42003040400 -79.9 40.4 42003040400 6 6 FALSE
7 42003040500 -80.0 40.4 42003040500 7 7 FALSE
8 42003040600 -80.0 40.4 42003040600 8 8 FALSE
9 42003040900 -80.0 40.4 42003040900 9 9 FALSE
10 42003050100 -80.0 40.4 42003050100 10 10 FALSE
# ℹ 392 more rows
<- str_c("Minimum: ", minimum_commuters, " commuters")
legend_title
ggraph(manual_layout_faceted) +
geom_sf(data = allegheny_tracts, size = .1, fill = NA, color = "dark grey") +
geom_sf(data = rivers, fill = "white", color = NA, alpha = .5) +
geom_edge_fan(aes(edge_width = commuters, edge_alpha = commuters),
arrow = arrow(length = unit(.5, 'lines')),
start_cap = circle(.1, 'lines'),
end_cap = circle(.2, 'lines'),
color = "red",
strength = .5) +
facet_edges(~name) +
scale_edge_width_continuous(legend_title, range = c(.1, 1.5)) +
scale_edge_alpha_continuous(legend_title, range = c(.1, 1)) +
scale_fill_viridis_c() +
labs(x = NULL,
y = NULL,
title = "Where do people commute from/to for work?",
subtitle = "Not including within-tract commuters",
caption = "Based on 2015 US Census LODES dataset | @conor_tompkins") +
theme_graph() +
theme(legend.background = element_rect(fill = "light grey"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
panel.background = element_rect(fill = "black"))
Zoom in on one tract
There also may be tracts of interest outside of the main commuter inflow/outflow tracts. This code creates a framework for filtering on commuters from a selected tract and zooming in on that tract.
Set the threshhold and filter:
<- 15
minimum_commuters
<- g %>%
g_filtered activate(edges) %>%
filter(commuters > minimum_commuters)
g_filtered
# A tbl_graph: 402 nodes and 5799 edges
#
# A directed simple graph with 11 components
#
# Edge Data: 5,799 × 3 (active)
from to commuters
<int> <int> <dbl>
1 1 2 73
2 1 5 22
3 2 5 87
4 2 402 75
5 2 1 37
6 2 202 35
7 2 50 34
8 2 163 33
9 2 9 32
10 2 400 31
# ℹ 5,789 more rows
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
This code gets the commuter outflow data for one tract and creates allegheny_tracts_highlight
, which will be used to highlight the tract of interest.
<- "42003473500"
filter_tract
<- allegheny_tracts %>%
allegheny_tracts_highlight semi_join(df_tracts_summarized %>%
filter(h_tract == filter_tract), by = c("GEOID" = "w_tract")) %>%
filter(commuters_in > minimum_commuters)
allegheny_tracts_highlight
Simple feature collection with 208 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -80.36087 ymin: 40.20845 xmax: -79.70383 ymax: 40.67494
Geodetic CRS: NAD83
# A tibble: 208 × 6
GEOID NAME total_pop geometry commuters_out commuters_in
* <chr> <chr> <dbl> <MULTIPOLYGON [°]> <dbl> <dbl>
1 4200301… Cens… 6600 (((-79.98077 40.43836, -… 390 5470
2 4200302… Cens… 3629 (((-79.9908 40.44442, -7… 1029 58444
3 4200302… Cens… 616 (((-79.98062 40.45904, -… 415 7155
4 4200304… Cens… 2604 (((-79.96802 40.4429, -7… 428 12340
5 4200304… Cens… 2488 (((-79.94652 40.44508, -… 357 955
6 4200304… Cens… 3694 (((-79.95497 40.4371, -7… 290 1817
7 4200304… Cens… 2969 (((-79.94986 40.43443, -… 805 7822
8 4200305… Cens… 2057 (((-79.96206 40.44863, -… 800 235
9 4200307… Cens… 2197 (((-79.93137 40.45274, -… 769 1343
10 4200307… Cens… 2997 (((-79.92533 40.45499, -… 1367 1277
# ℹ 198 more rows
<- manual_layout %>%
selected_node filter(name == filter_tract) %>%
pull(.ggraph.orig_index)
<- g_filtered %>%
g_filtered activate(edges) %>%
filter(from == selected_node)
g_filtered
# A tbl_graph: 402 nodes and 17 edges
#
# A rooted forest with 385 trees
#
# Edge Data: 17 × 3 (active)
from to commuters
<int> <int> <dbl>
1 233 2 286
2 233 402 67
3 233 5 46
4 233 202 41
5 233 229 40
6 233 217 33
7 233 189 28
8 233 190 26
9 233 218 24
10 233 3 23
11 233 1 22
12 233 234 21
13 233 249 21
14 233 241 17
15 233 50 16
16 233 163 16
17 233 375 16
#
# Node Data: 402 × 1
name
<chr>
1 42003010300
2 42003020100
3 42003020300
# ℹ 399 more rows
Set the layout for the graph:
<- create_layout(graph = g_filtered,
manual_layout_filtered layout = node_pos)
# A tibble: 402 × 7
GEOID x y name .ggraph.orig_index .ggraph.index circular
<chr> <dbl> <dbl> <chr> <int> <int> <lgl>
1 42003010300 -80.0 40.4 42003010300 1 1 FALSE
2 42003020100 -80.0 40.4 42003020100 2 2 FALSE
3 42003020300 -80.0 40.5 42003020300 3 3 FALSE
4 42003030500 -80.0 40.4 42003030500 4 4 FALSE
5 42003040200 -80.0 40.4 42003040200 5 5 FALSE
6 42003040400 -79.9 40.4 42003040400 6 6 FALSE
7 42003040500 -80.0 40.4 42003040500 7 7 FALSE
8 42003040600 -80.0 40.4 42003040600 8 8 FALSE
9 42003040900 -80.0 40.4 42003040900 9 9 FALSE
10 42003050100 -80.0 40.4 42003050100 10 10 FALSE
# ℹ 392 more rows
This will be used to zoom in on the selected tract:
<- manual_layout_filtered %>%
zoom_x filter(.ggraph.orig_index == selected_node) %>%
pull(x)
<- manual_layout_filtered %>%
zoom_y filter(.ggraph.orig_index == selected_node) %>%
pull(y)
<- .25 zoom_magnitude
<- str_c("Minimum: ", minimum_commuters, " commuters")
legend_title
ggraph(manual_layout_filtered) +
geom_sf(data = allegheny_tracts, size = .1, fill = NA) +
geom_sf(data = rivers, fill = "white", color = NA, alpha = .5) +
geom_sf(data = allegheny_tracts %>% filter(GEOID == filter_tract),
fill = "grey") +
geom_edge_fan(aes(edge_width = commuters),
arrow = arrow(length = unit(.5, 'lines')),
start_cap = circle(.1, 'lines'),
end_cap = circle(.2, 'lines'),
color = "red",
strength = .5) +
coord_sf(xlim = c(zoom_x - zoom_magnitude, zoom_x + zoom_magnitude),
ylim = c(zoom_y - zoom_magnitude, zoom_y + zoom_magnitude)) +
scale_edge_width_continuous(legend_title, range = c(.1, 1.5)) +
scale_edge_alpha_continuous(legend_title, range = c(.1, .8)) +
scale_fill_viridis_c() +
labs(x = NULL,
y = NULL,
title = "Where do people commute from/to for work?",
subtitle = str_c("From tract", filter_tract, sep = " "),
caption = "Based on 2015 US Census LODES dataset | @conor_tompkins") +
theme_graph() +
theme(legend.background = element_rect(fill = "light grey"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
panel.background = element_rect(fill = "black"))
References
- http://www.robertmanduca.com/projects/jobs.html
- https://lehd.ces.census.gov/data/
- https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.3.pdf
- https://walkerke.github.io/tidycensus/articles/basic-usage.html