library(tidyverse)
library(janitor)
library(broom)
library(sf)
library(scales)
library(gganimate)
library(lwgeom)
options(scipen = 999, digits = 4)
theme_set(theme_bw())
<- "@conor_tompkins - data from @WPRDC" my_caption
In this post I will show how to create animated graphs that illustrate the increase in buildings in Allegheny County.
One caveat about the data: it only includes parcels that were sold at some point. If the parcel was not sold, it is not included in this data. For example, a structure that was torn down and replaced but was not sold is not included. It is also reasonable to assume that the data quality decreases the older the records are. There may be a large amount of missing data.
The shapefiles for the parcels come from Pennsylvania Spatial Data Access.
The data about the construction dates comes from the WPRDC’s Parcels n’at dashboard. To get the relevant data, draw a box around entire county, select the “Year Built” field in the Property Assessments section, and then download the data. It will take a while to download data for the entire county.
Set up the environment:
This reads in data about the land parcel (lot lines):
<- read_csv("post_data/parcel_data.csv", progress = FALSE) %>%
df clean_names() |>
select(parid, yearblt)
This reads in the parcel geometry
<- "post_data/AlleghenyCounty_Parcels202409/AlleghenyCounty_Parcels202409.shp"
file file
[1] "post_data/AlleghenyCounty_Parcels202409/AlleghenyCounty_Parcels202409.shp"
<- st_read(file) shapefile
Reading layer `AlleghenyCounty_Parcels202409' from data source
`/Users/conorotompkins/Documents/github_repos/ctompkins_quarto_blog/posts/animating-growth-of-allegheny-county/post_data/AlleghenyCounty_Parcels202409/AlleghenyCounty_Parcels202409.shp'
using driver `ESRI Shapefile'
Simple feature collection with 585186 features and 10 fields (with 9 geometries empty)
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 1243000 ymin: 321300 xmax: 1430000 ymax: 497900
Projected CRS: NAD83 / Pennsylvania South (ftUS)
Next we have to clean up the parcel geometry:
<- shapefile %>%
valid_check slice(1:nrow(shapefile)) %>%
pull(geometry) %>%
map(st_is_valid) %>%
unlist()
$validity_check <- valid_check
shapefile
<- shapefile %>%
shapefile filter(validity_check == TRUE)
<- shapefile %>%
shapefile st_make_valid() %>%
clean_names() %>%
mutate(pin = as.character(pin))
Then, join the parcel geometry and parcel data:
<- shapefile %>%
parcel_data left_join(df, by = join_by(pin == parid))
This turns the parcel geometry into (x, y)
coordinates:
<- parcel_data %>%
centroids st_centroid() %>%
st_coordinates() %>%
as_tibble() %>%
clean_names()
We can plot the coordinates to confirm that the locations make sense:
%>%
centroids distinct(x, y) %>%
ggplot(aes(x, y)) +
geom_point(size = .1, alpha = .1) +
theme_void() +
coord_equal()
This plot shows that there is one row where yearblt_asmt
is zero. That doesn’t make sense, so we will exclude it later.
%>%
df ggplot(aes(yearblt)) +
geom_density() +
geom_rug() +
labs(title = "Structures in Allegheny County",
x = "Year built",
y = "Density",
subtitle = my_caption)
This combines the parcel_data
and centroid
data:
<- bind_cols(parcel_data, centroids) %>%
parcel_geometry_cleaned select(pin, x, y, yearblt) %>%
mutate(yearblt = as.integer(yearblt)) %>%
filter(!is.na(yearblt),
> 1000) %>%
yearblt st_set_geometry(NULL)
This plots the culmulative sum of structures built:
<- parcel_geometry_cleaned %>%
parcel_cumulative select(pin, yearblt) %>%
arrange(yearblt) %>%
count(yearblt) %>%
mutate(cumulative_n = cumsum(n)) %>%
ggplot(aes(yearblt, cumulative_n)) +
geom_line() +
geom_point() +
scale_y_continuous(label = comma) +
labs(title = "Cumulative sum of structures built in Allegheny County",
x = "Year Built",
y = "Cumulative sum",
caption = my_caption) +
transition_reveal(yearblt)
parcel_cumulative
This creates a graph of the structures built in Allegheny County, colored by the construction year.
%>%
parcel_geometry_cleaned ggplot(aes(x, y, color = yearblt, group = pin)) +
geom_point(alpha = .3, size = .1) +
scale_color_viridis_c("Year structure was built") +
theme_void() +
theme(axis.text = element_blank(),
axis.title = element_blank()) +
labs(title = "Allegheny County land parcels",
subtitle = "Year built: {frame_along}",
caption = "@conor_tompkins, data from @WPRDC") +
transition_reveal(yearblt)