Driving Alone vs. Public Transportation in Pittsburgh

Mapping commuter modes with bivariate discretized bins

Conor Tompkins

Mar 29, 2021 10 min read

Intro

The clash between public transportation and single passenger vehicles is a heated topic of discussion nationally and in the Pittsburgh area. Public transit ridership has been heavily reduced by COVID-19 in many countries. These two commuting modes compete for the same riders, and investment dollars, and space. Car drivers are frustrated when a bus stops during rush hour to pick up passengers, while bus passengers are frustrated sitting in traffic caused by single passenger vehicles because transit doesn’t have right-of-way.

From my point of view, Pittsburgh’s geography lends itself to a focus on public transit, at the expense of the single passenger vehicle. Most of the jobs in the county are in a single census tract Downtown, which is reflected in the spoke (and no wheel) design of the transit system. Downtown is surrounded by rivers and mountains, which drastically narrows the geography suited to infrastructure. You pretty much have to use a tunnel or bridge to commute Downtown, unless you are coming from directly east. It would make sense to give public transit priority access to those tunnels and bridges, since their throughput is many times higher than roads designated for single passenger vehicles.

The historical priority towards single passenger vehicles is reflected in the Census statistics about commuting modes in the area. Most people in the area commute to work by themselves in cars. In the Wexford-area census tract, 78% (5,141) of commuters drive to work alone (and sit in traffic on the parkway together). Public transit use is limited to areas where the government invested in transit, but even there transit is not typically the majority mode.

In this post I will use {tidycensus} to pull data about how many people commute by driving alone or taking public transit Allegheny County. I chose these two modes because they are the two most popular modes in the county, and are the most different in terms of style. I then graph the data with {ggplot2} and {biscale}. I hack the {biscale} legend a bit to get it to show the % of commuters, which may be of interest to other R users.

Code and graphs

Load libraries and set up the environment:

library(tidyverse)
library(tidycensus)
library(sf)
library(tigris)
library(janitor)
library(biscale)
library(patchwork)
library(hrbrthemes)
library(kableExtra)

options(scipen = 999, digits = 4)

theme_set(theme_ipsum(base_size = 25))

These are the variables about commuter mode that the Census has for the 2019 American Community Survey (ACS):

acs1_vars <- load_variables(2019, 'acs1') %>% 
  mutate(across(c(label, concept), str_to_lower))

acs1_vars %>%
  filter(str_detect(name, "^B08301_")) %>% 
  kbl() %>% 
  scroll_box(height = "400px") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                position = "left")

name	label	concept
B08301_001	estimate!!total:	means of transportation to work
B08301_002	estimate!!total:!!car, truck, or van:	means of transportation to work
B08301_003	estimate!!total:!!car, truck, or van:!!drove alone	means of transportation to work
B08301_004	estimate!!total:!!car, truck, or van:!!carpooled:	means of transportation to work
B08301_005	estimate!!total:!!car, truck, or van:!!carpooled:!!in 2-person carpool	means of transportation to work
B08301_006	estimate!!total:!!car, truck, or van:!!carpooled:!!in 3-person carpool	means of transportation to work
B08301_007	estimate!!total:!!car, truck, or van:!!carpooled:!!in 4-person carpool	means of transportation to work
B08301_008	estimate!!total:!!car, truck, or van:!!carpooled:!!in 5- or 6-person carpool	means of transportation to work
B08301_009	estimate!!total:!!car, truck, or van:!!carpooled:!!in 7-or-more-person carpool	means of transportation to work
B08301_010	estimate!!total:!!public transportation (excluding taxicab):	means of transportation to work
B08301_011	estimate!!total:!!public transportation (excluding taxicab):!!bus	means of transportation to work
B08301_012	estimate!!total:!!public transportation (excluding taxicab):!!subway or elevated rail	means of transportation to work
B08301_013	estimate!!total:!!public transportation (excluding taxicab):!!long-distance train or commuter rail	means of transportation to work
B08301_014	estimate!!total:!!public transportation (excluding taxicab):!!light rail, streetcar or trolley (carro pãºblico in puerto rico)	means of transportation to work
B08301_015	estimate!!total:!!public transportation (excluding taxicab):!!ferryboat	means of transportation to work
B08301_016	estimate!!total:!!taxicab	means of transportation to work
B08301_017	estimate!!total:!!motorcycle	means of transportation to work
B08301_018	estimate!!total:!!bicycle	means of transportation to work
B08301_019	estimate!!total:!!walked	means of transportation to work
B08301_020	estimate!!total:!!other means	means of transportation to work
B08301_021	estimate!!total:!!worked from home	means of transportation to work

Driving alone in a single-passenger vehicle is by far the dominant commuting mode in the county.

all_transit_vars <- c("B08301_003", 
                      "B08301_004", 
                      "B08301_010", 
                      "B08301_016", 
                      "B08301_017", 
                      "B08301_018", 
                      "B08301_019", 
                      "B08301_020",
                      "B08301_021")

all_transit_modes <- get_acs(geography = "county", 
                             variables = acs1_vars %>%
                               filter(name %in% all_transit_vars) %>% 
                               pull(name, label),
                             summary_var = "B08301_001",
                             year = 2019, state = "PA", county = "Allegheny",
                             geometry = F)

all_transit_modes %>% 
  mutate(variable = str_remove(variable, "^estimate!!total:"),
         variable = str_remove(variable, "\\(excluding taxicab\\)"),
         variable = str_remove_all(variable, "\\!"),
         variable = str_remove(variable, ":$"),
         variable = str_replace(variable, ":", " : "),
         variable = str_trim(variable),
         variable = str_to_title(variable)) %>% 
  group_by(variable) %>% 
  summarize(estimate = sum(estimate),) %>% 
  mutate(variable = fct_reorder(variable, estimate),
         pct = estimate / sum(estimate)) %>% 
  ggplot(aes(estimate, variable)) +
  geom_col() +
  geom_text(aes(x = estimate + 26000, label = scales::percent(pct, 1)),
            size = 4) +
  labs(title = "Allegheny County Commuter Modes",
       subtitle = "2019 American Community Survey",
       x = "Commuters",
       y = NULL) +
  scale_x_comma(limits = c(0, 500000),
                labels = c("0", "1k", "2k", "3k", "4k", "5k")) +
  theme_ipsum(axis_text_size = 15)

I will use these two variables to directly compare the use of single-passenger vehicles and public transit in the county.

vars <- c("Drove alone" = "B08301_003",
          "Public transportation" = "B08301_010")

acs1_vars %>%
  filter(name %in% vars) %>% 
  pull(label)

## [1] "estimate!!total:!!car, truck, or van:!!drove alone"          
## [2] "estimate!!total:!!public transportation (excluding taxicab):"

This downloads the commuter mode data and subtracts the rivers from the census tract polygons so it looks nice on a map:

st_erase <- function(x, y) {
  st_difference(x, st_union(y))
}

rivers <- area_water("PA", "Allegheny", class = "sf", progress_bar = F) %>% 
  arrange(desc(AWATER)) %>% 
  slice(1:3) %>% 
  st_simplify()

tracts <- get_acs(geography = "tract", variables = "B08301_001",
        year = 2019, state = "PA", county = "Allegheny",
        geometry = T) %>% 
  st_erase(rivers)

tract_transit <- get_acs(geography = "tract", variables = vars, summary_var = "B08301_001",
                         year = 2019, state = "PA", county = "Allegheny",
                         geometry = T) %>% 
  select(GEOID, variable, estimate, summary_est) %>% 
  st_erase(rivers)

tract_transit %>% 
  glimpse()

## Rows: 804
## Columns: 5
## $ GEOID       <chr> "42003408002", "42003408002", "42003210700", "42003210700"…
## $ variable    <chr> "Drove alone", "Public transportation", "Drove alone", "Pu…
## $ estimate    <dbl> 2815, 8, 589, 189, 566, 146, 1224, 260, 466, 100, 1063, 21…
## $ summary_est <dbl> 3165, 3165, 1231, 1231, 1110, 1110, 1992, 1992, 702, 702, …
## $ geometry    <MULTIPOLYGON [°]> MULTIPOLYGON (((-79.99 40.6..., MULTIPOLYGON …

As discussed earlier, public transit is not the majority commuting mode in most areas:

tract_transit %>% 
  st_drop_geometry() %>% 
  group_by(GEOID) %>%
  mutate(pct_tract_commuters = estimate / sum(estimate),
         combined_commuters = sum(estimate)) %>%
  ungroup() %>%
  mutate(GEOID = fct_reorder(GEOID, summary_est)) %>%
  arrange(desc(GEOID), desc(summary_est)) %>% 
  mutate(is_downtown_label = case_when(GEOID == "42003020100" & variable == "Drove alone" ~ "Downtown*",
                                       TRUE ~ NA_character_)) %>% 
  slice(1:60) %>% 
  ggplot(aes(estimate, GEOID, fill = variable)) +
  geom_col(color = "black") +
  geom_text(aes(x = estimate + 3000, label = is_downtown_label)) +
  labs(title = "Top 30 census tracts",
       subtitle = "Total commuter population from all modes",
       x = "Commuters",
       y = "Census tracts",
       fill = "Commute mode") +
  scale_x_comma() +
  theme_ipsum(base_size = 15) +
  theme(axis.text.y = element_blank(),
        panel.grid.major = element_blank())

*Most commuters that live in Downtown walk to work.

This shows that in absolute numbers, driving alone swamps public transit across the county.

scatter_graph <- tract_transit %>% 
  st_drop_geometry() %>% 
  select(GEOID, variable, estimate) %>% 
  pivot_wider(names_from = variable, values_from = estimate) %>% 
  clean_names() %>% 
  ggplot(aes(drove_alone, public_transportation)) +
  geom_point(alpha = .7, size = 1) +
  labs(title = "Commuter modes in Allegheny County",
       x = "Driving Alone",
       y = "Using Public Transportation") +
  scale_x_comma() +
  scale_y_comma() +
  tune::coord_obs_pred() +
  theme_ipsum(base_size = 15)

scatter_graph

I made the X and Y axes symmetric to emphasize the difference in scale between the two variables.

This uses the bi_class function to divide the data into discrete bins based on how many people drive alone vs. use public transit. This turns two continuous variables into one categorical variable. I had to play around with the style argument to find an option that worked for the unbalanced data.

transit_bivariate_geo <- tract_transit %>% 
  st_drop_geometry() %>% 
  drop_na(estimate) %>% 
  select(GEOID, variable, estimate) %>% 
  pivot_wider(names_from = variable, values_from = estimate) %>% 
  clean_names() %>% 
  replace_na(list(drove_alone = 0, public_transportation = 0)) %>% 
  bi_class(x = drove_alone, 
           y = public_transportation, 
           style = "fisher", 
           dim = 3) %>% 
  left_join(tracts, 
            by = c("geoid" = "GEOID")) %>% 
  st_sf()

glimpse(transit_bivariate_geo)

## Rows: 402
## Columns: 9
## $ geoid                 <chr> "42003408002", "42003210700", "42003220600", "42…
## $ drove_alone           <dbl> 2815, 589, 566, 1224, 466, 1063, 887, 826, 551, …
## $ public_transportation <dbl> 8, 189, 146, 260, 100, 215, 262, 342, 670, 61, 3…
## $ bi_class              <chr> "3-1", "1-2", "1-1", "2-2", "1-1", "2-2", "1-2",…
## $ NAME                  <chr> "Census Tract 4080.02, Allegheny County, Pennsyl…
## $ variable              <chr> "B08301_001", "B08301_001", "B08301_001", "B0830…
## $ estimate              <dbl> 3165, 1231, 1110, 1992, 702, 1487, 1317, 1712, 1…
## $ moe                   <dbl> 231, 199, 116, 189, 80, 233, 163, 185, 257, 104,…
## $ geometry              <MULTIPOLYGON [°]> MULTIPOLYGON (((-79.99 40.6..., MUL…

table(transit_bivariate_geo$bi_class) %>% 
  enframe(name = "bi_class", value = "count_tracts") %>% 
  kbl()

bi_class	count_tracts
1-1	124
1-2	72
1-3	9
2-1	84
2-2	66
2-3	9
3-1	34
3-2	4

This graph overlays the discrete biscale bins on the previous data to show how the function discretized the data.

transit_bivariate_geo %>% 
  ggplot(aes(drove_alone, public_transportation, color = bi_class)) +
  geom_point(alpha = .75, size = 1) +
  scale_x_comma() +
  labs(x = "Drove Alone",
       y = "Used Public Transit") +
  guides(color = FALSE) +
  theme_ipsum(base_size = 15)

Note that the X and Y axes are independent in this graph.

This creates the biscale legend I will put next to the map.

bi_var_legend <- bi_legend(pal = "DkBlue",
                           dim = 3,
                           xlab = " More drove alone",
                           ylab = "More used public transit",
                           size = 26) +
  theme(plot.background = element_rect(fill = alpha("white", 0)),
        panel.background = element_rect(fill = alpha("white", 0)))

bi_var_legend

I would like to show the % of commuters that each bin represents, so I extract the color palette from the ggplot2 object and make my own legend with geom_tile.

built_legend <- ggplot_build(bi_var_legend)

legend_palette <- built_legend$data[[1]] %>%
  mutate(bi_class = str_c(x, y, sep = "-")) %>% 
  select(fill, bi_class)

legend_palette %>% 
  kbl()

fill	bi_class
#3B4994	3-3
#8C62AA	2-3
#BE64AC	1-3
#5698B9	3-2
#A5ADD3	2-2
#DFB0D6	1-2
#5AC8C8	3-1
#ACE4E4	2-1
#E8E8E8	1-1

transit_bivariate <- transit_bivariate_geo %>% 
  st_drop_geometry() %>% 
  select(geoid, bi_class, drove_alone, public_transportation) %>% 
  separate(bi_class, 
           into = c("drove_alone_bi", "public_transportation_bi"), 
           sep = "-",
           remove = FALSE) %>% 
  complete(drove_alone_bi, public_transportation_bi, fill = list(drove_alone = 0, public_transportation = 0)) %>% 
  mutate(bi_class = str_c(drove_alone_bi, public_transportation_bi, sep = "-"),
         total = drove_alone + public_transportation,
         pct_commuters = total / sum(total)) %>%
  group_by(bi_class, drove_alone_bi, public_transportation_bi) %>% 
  summarize(count_tract = n(),
            pct_commuters = sum(pct_commuters)) %>% 
  ungroup()

glimpse(transit_bivariate)

## Rows: 9
## Columns: 5
## $ bi_class                 <chr> "1-1", "1-2", "1-3", "2-1", "2-2", "2-3", "3-…
## $ drove_alone_bi           <chr> "1", "1", "1", "2", "2", "2", "3", "3", "3"
## $ public_transportation_bi <chr> "1", "2", "3", "1", "2", "3", "1", "2", "3"
## $ count_tract              <int> 124, 72, 9, 84, 66, 9, 34, 4, 1
## $ pct_commuters            <dbl> 0.13661, 0.11754, 0.02073, 0.25273, 0.22252, …

legend_palette <- transit_bivariate %>% 
  distinct(bi_class) %>% 
  left_join(legend_palette, by = "bi_class")

legend_palette %>% 
  kbl()

bi_class	fill
1-1	#E8E8E8
1-2	#DFB0D6
1-3	#BE64AC
2-1	#ACE4E4
2-2	#A5ADD3
2-3	#8C62AA
3-1	#5AC8C8
3-2	#5698B9
3-3	#3B4994

Note that scale_fill_manual uses the palette I extracted from the ggplot2 object.

bi_var_legend_new <- transit_bivariate %>% 
  mutate(pct_commuters = scales::percent(pct_commuters, accuracy = 1)) %>% 
  ggplot(aes(x = drove_alone_bi, y = public_transportation_bi, fill = bi_class)) +
  geom_tile() +
  geom_label(fill = "white", alpha = .75, size = 12, label = "    ") +
  geom_text(aes(label = pct_commuters), alpha = 1, size = 7) +
  coord_fixed(ratio = 1) +
  labs(x = substitute(paste("More drove alone", "" %->% "")),
       y = substitute(paste("More used public transit", "" %->% ""))) +
  guides(fill = FALSE) +
  scale_fill_manual(values = pull(legend_palette, fill)) +
  theme_ipsum(plot_title_size = 30,
              axis_title_size = 30) +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank())

bi_var_legend_new +
  labs(title = 'Percent of "drive alone" + "public transit" commuters')

This creates the map of commuter mode by census tract, filled by the discretized biscale bin.

transit_bi_var_plot <- transit_bivariate_geo %>% 
  ggplot(aes(fill = bi_class)) +
  geom_sf(show.legend = FALSE, size = NA) +
  geom_sf(data = rivers, fill = "black", color = "black") +
  bi_scale_fill(pal = "DkBlue", dim = 3) +
  bi_theme() +
  theme_ipsum(base_size = 15) +
  theme(axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

transit_bi_var_plot

Now that I have my legend and map, I use patchwork to stitch them together.

design = c(area(t = 2, l = 4, b = 20, r = 20),
           area(t = 1, l = 1, b = 6, r = 6))

plot(design)

combined_bi_var_plot <- transit_bi_var_plot + bi_var_legend_new +
  plot_layout(design = design) +
  plot_annotation(title = "Allegheny County Commuter Patterns",
                  subtitle = "Legend: % of commuters that drove alone or use public transit",
                  caption = "2019 American Community Survey",
                  theme = theme(panel.background = element_rect(fill = "black"),
                                plot.title = element_text(size = 30),
                                plot.subtitle = element_text(size = 25),
                                plot.caption = element_text(size = 25)))

Driving Alone vs. Public Transportation in Pittsburgh

Intro

Code and graphs

Links:

Related