Graphing Seasonality in Ebird Bird Sightings
Over the winter I became interested in birding. Sitting in your back yard doing nothing but watching birds fly around is quite relaxing. Naturally I am looking for ways to optimize and quantify this relaxing activity. eBird lets you track your bird sightings and research which birds are common or more rare in your area. Luckily, the folks at ROpenSci have the {rebird}
package, which provides an easy interface to the eBird API.
In this post I will graph the seasonality of observation frequency of the top 10 birds in Pennsylvania. Frequency in this context is the % of eBird checklists that the bird appeared in during a given period.
Load up packages:
library(tidyverse)
library(lubridate)
library(vroom)
library(janitor)
library(rebird)
library(hrbrthemes)
library(ggrepel)
library(gganimate)
theme_set(theme_ipsum())
The ebirdfreq
takes a location and time period and returns the frequency and sample size for the birds returned in the query.
df_freq_raw <- ebirdfreq(loctype = 'states', loc = 'US-PA', startyear = 2019,
endyear = 2019, startmonth = 1, endmonth = 12)
df_freq_raw
## # A tibble: 22,176 x 4
## comName monthQt frequency sampleSize
## <chr> <chr> <dbl> <dbl>
## 1 Black-bellied Whistling-Duck January-1 0 4448
## 2 Snow Goose January-1 0.0220 4448
## 3 Ross's Goose January-1 0.000674 4448
## 4 Snow x Ross's Goose (hybrid) January-1 0 4448
## 5 Snow/Ross's Goose January-1 0 4448
## 6 Graylag Goose (Domestic type) January-1 0.000225 4448
## 7 Swan Goose (Domestic type) January-1 0 4448
## 8 Graylag x Swan Goose (Domestic type) (hybrid) January-1 0 4448
## 9 Greater White-fronted Goose January-1 0.00360 4448
## 10 Domestic goose sp. (Domestic type) January-1 0.00292 4448
## # … with 22,166 more rows
This does some light data munging to get the data in shape.
df_freq_clean <- df_freq_raw %>%
clean_names() %>%
separate(month_qt, into = c("month", "week")) %>%
mutate(week = as.numeric(week),
month = ymd(str_c("2019", month, "01", sep = "-")),
month = month(month, label = TRUE, abbr = TRUE),
state = "PA") %>%
rename(common_name = com_name) %>%
arrange(common_name, month, week)
df_freq_clean
## # A tibble: 22,176 x 6
## common_name month week frequency sample_size state
## <chr> <ord> <dbl> <dbl> <dbl> <chr>
## 1 Acadian Flycatcher Jan 1 0 4448 PA
## 2 Acadian Flycatcher Jan 2 0 3382 PA
## 3 Acadian Flycatcher Jan 3 0 3306 PA
## 4 Acadian Flycatcher Jan 4 0 4830 PA
## 5 Acadian Flycatcher Feb 1 0 3890 PA
## 6 Acadian Flycatcher Feb 2 0 3605 PA
## 7 Acadian Flycatcher Feb 3 0 7848 PA
## 8 Acadian Flycatcher Feb 4 0 3636 PA
## 9 Acadian Flycatcher Mar 1 0 3737 PA
## 10 Acadian Flycatcher Mar 2 0 4406 PA
## # … with 22,166 more rows
This takes the month-week time series and summarizes to the month level:
df_month <- df_freq_clean %>%
group_by(common_name, month) %>%
summarize(sample_size_mean = mean(sample_size),
frequency_mean = mean(frequency) %>% round(2)) %>%
ungroup()
df_month
## # A tibble: 5,544 x 4
## common_name month sample_size_mean frequency_mean
## <chr> <ord> <dbl> <dbl>
## 1 Acadian Flycatcher Jan 3992. 0
## 2 Acadian Flycatcher Feb 4745. 0
## 3 Acadian Flycatcher Mar 4748 0
## 4 Acadian Flycatcher Apr 5392. 0
## 5 Acadian Flycatcher May 5868. 0.04
## 6 Acadian Flycatcher Jun 3367. 0.06
## 7 Acadian Flycatcher Jul 2639 0.05
## 8 Acadian Flycatcher Aug 2876. 0.02
## 9 Acadian Flycatcher Sep 3198. 0
## 10 Acadian Flycatcher Oct 2894. 0
## # … with 5,534 more rows
Here I find the top 10 birds in terms of average monthly observation frequency:
df_top_birds <- df_freq_clean %>%
group_by(common_name) %>%
summarize(sample_size_mean = mean(sample_size),
frequency_mean = mean(frequency) %>% round(2)) %>%
ungroup() %>%
arrange(desc(frequency_mean)) %>%
select(common_name) %>%
slice(1:10)
df_top_birds
## # A tibble: 10 x 1
## common_name
## <chr>
## 1 Northern Cardinal
## 2 Blue Jay
## 3 Mourning Dove
## 4 American Robin
## 5 Song Sparrow
## 6 American Crow
## 7 Red-bellied Woodpecker
## 8 American Goldfinch
## 9 Carolina Wren
## 10 Downy Woodpecker
This basic line graph shows some of the pattern of seasonality, but fails to show the cyclical nature of the data.
df_month %>%
semi_join(df_top_birds) %>%
ggplot(aes(month, frequency_mean, group = common_name)) +
geom_line() +
scale_y_percent() +
labs(title = "Bird observation frequency",
subtitle = "Top 10 birds in PA, 2019",
x = NULL,
y = "Mean frequency",
caption = "Data from ebird.org. @conorotompkins")
I use coord_polar
to change the coordinate system to match the cyclical flow of the months:
df_month %>%
semi_join(df_top_birds) %>%
ggplot(aes(month, frequency_mean, group = common_name)) +
geom_polygon(color = "black", fill = NA, size = .5) +
coord_polar() +
scale_y_percent() +
labs(title = "Bird observation frequency",
subtitle = "Top 10 birds in PA, 2019",
x = NULL,
y = "Mean frequency",
caption = "Data from ebird.org. @conorotompkins")
gganimate
lets me focus on one species at a time while showing all the data.
plot_animated <- df_month %>%
semi_join(df_top_birds) %>%
mutate(common_name = fct_inorder(common_name)) %>%
ggplot(aes(month, frequency_mean)) +
geom_polygon(data = df_month %>% rename(name = common_name),
aes(group = name),
color = "grey", fill = NA, size = .5) +
geom_polygon(aes(group = common_name),
color = "blue", fill = NA, size = 1.2) +
coord_polar() +
#facet_wrap(~common_name) +
scale_y_percent() +
labs(subtitle = "Most frequently observed birds in PA (2019)",
x = NULL,
y = "Frequency of observation",
caption = "Data from ebird.org. @conorotompkins") +
theme(plot.margin = margin(2, 2, 2, 2),
plot.title = element_text(color = "blue"))
plot_animated +
transition_manual(common_name) +
ggtitle("{current_frame}")