Analyzing US labor force participation rates with {feasts}

Author

Conor Tompkins

Published

February 10, 2025

Intro

While looking through FRED data for work, I ran across the Labor Force Participation category. There are some interesting trend and seasonality patterns that I want to investigate with time series methods. There is a useful FRED blog post that provides a good overview of the topic of labor force participation rate (LFPR). Their definition is:

“…those who want to work (i.e., have a job or want one) relative to those who could work (the entire population over age 16 that isn’t incarcerated or on active military duty).”

For this analysis I will focus on men and women over 20 years old, which is more specific than the above definition. The data is available monthly since 1948.

Code
library(tidyverse)
library(fpp3)
library(broom)
library(fredr)
library(hrbrthemes)
library(GGally)
library(scales)
library(plotly)
library(here)
library(ggthemes)
library(ggrepel)

theme_set(theme_ipsum())

options(scipen = 999, digits = 4)

Load in data from FRED

This code takes the FRED series ID and uses map_dfr to read in the data from the FRED API with fredr and combine the results into one dataframe. I specifically chose the non-seasonally adjusted datasets because I want to look at the seasonality.

Code
fred_df_raw <- c("men >= age 20" = "LNU01300025",
             "women >= age 20" = "LNU01300026") |> 
  map_dfr(fredr, .id = "series")
Code
glimpse(fred_df_raw)
Rows: 1,850
Columns: 6
$ series         <chr> "men >= age 20", "men >= age 20", "men >= age 20", "men…
$ date           <date> 1948-01-01, 1948-02-01, 1948-03-01, 1948-04-01, 1948-0…
$ series_id      <chr> "LNU01300025", "LNU01300025", "LNU01300025", "LNU013000…
$ value          <dbl> 87.9, 88.1, 87.8, 88.1, 88.1, 88.9, 89.3, 89.5, 89.1, 8…
$ realtime_start <date> 2025-02-07, 2025-02-07, 2025-02-07, 2025-02-07, 2025-0…
$ realtime_end   <date> 2025-02-07, 2025-02-07, 2025-02-07, 2025-02-07, 2025-0…
Code
#set manual color palette for men and women
series_vec <- fred_df_raw |> 
  distinct(series) |> 
  pull()

#RColorBrewer::brewer.pal(3, "Dark2")[1:2]

series_pal <- colorblind_pal()(2)

names(series_pal) <- series_vec

#show_col(series_pal)

Time series exploratory data analysis (EDA)

Here I use the tsibble package to transform the dataframe into a time series table. Then I plot the data with autoplot and use {plotly} to make it interactive.

Code
fred_df <- fred_df_raw |> 
  select(series, date, value) |> 
  rename(participation_rate = value) |> 
  mutate(date = yearmonth(date)) |> 
  as_tsibble(key = series, index = date)

x <- autoplot(fred_df) +
  scale_y_percent(scale = 1) +
  scale_color_manual(values = series_pal) +
  labs(title = "Labor Force Participation Rate",
       x = "Date",
       y = "LFPR",
       color = "Series")

ggplotly(x)
1960 Jan1980 Jan2000 Jan2020 Jan30%40%50%60%70%80%90%
seriesmen >= age 20women >= age 20 Labor Force Participation Rate DateLFPR

There are a couple general trends in this data.

  • The LFPR among men declined in the 50’s and 60’s, stabilized briefly in the 80s, and continued to decline afterwards. There were steep declines due to the 2008 financial crisis and COVID-19 in 2020.

  • Among women, the LFPR rose steeply from the 50’s to the 2000’s. This reflects the increase in the types of jobs that were available to women over time.

  • The LFPR in the two groups reacted differently to the financial crisis and COVID-19. In general, men’s LFPR didn’t decrease as much as women’s, but the LFPR among women bounced back stronger in each case.

  • Seasonality

    • The strong seasonailty among men peaks early in this series and becomes weaker over time.

    • The pattern of seasonality among women appears to be strong but highly variable early on, and then becomes weaker over time.

    • My hypothesis is that the decreasing seasonality is due to changes in what types of jobs are available and people’s preference for those jobs. For example, agricultural labor and construction are highly seasonal, but may have become a smaller % of the available jobs over time.

For context, the overall LFPR increased starting in the late 1960s, but has declined since 2000:

The FRED post linked above states that the overall decline is due to demographic factors:

…explained by the Baby Boomers retiring and slower U.S. population growth: Subsequent generations have been smaller than the Baby Boomer generation, so their entry into the labor force hasn’t made up for the retiring Boomers.

The trend among men and women are subsets of this overall trend, although the steep decline in these subsets didn’t begin until the 2008 financial crisis.

Trend and seasonality

Here I use STL (Multiple seasonal decomposition by Loess) to decompose each time series into trend, seasonality, and remainder. This models the strength of the trend and seasonality over time. The remainder is the noise in the data that cannot be explained by the trend or seasonality. I use components to extract the components of the decomposition to analyze them further. I also transform participation_rate to the log scale, which makes the STL decomposition multiplicative instead of additive. This makes it easier to compare the magnitude of seasonality (season_year) and remainder between series.

Code
stl_models <- fred_df |> 
  model(stl = STL(log(participation_rate) ~ trend() + season()))

stl_components <- stl_models |>
  components()

stl_components
# A dable: 1,850 x 8 [1M]
# Key:     series, .model [2]
# :        log(participation_rate) = trend + season_year + remainder
   series     .model     date log(participation_ra…¹ trend season_year remainder
   <chr>      <chr>     <mth>                  <dbl> <dbl>       <dbl>     <dbl>
 1 men >= ag… stl    1948 Jan                   4.48  4.48   -0.00709   0.00132 
 2 men >= ag… stl    1948 Feb                   4.48  4.48   -0.00601   0.00221 
 3 men >= ag… stl    1948 Mar                   4.48  4.48   -0.00343  -0.00409 
 4 men >= ag… stl    1948 Apr                   4.48  4.48   -0.00213  -0.00228 
 5 men >= ag… stl    1948 May                   4.48  4.48   -0.000316 -0.00431 
 6 men >= ag… stl    1948 Jun                   4.49  4.48    0.00431  -0.000127
 7 men >= ag… stl    1948 Jul                   4.49  4.48    0.00631   0.00215 
 8 men >= ag… stl    1948 Aug                   4.49  4.48    0.00693   0.00357 
 9 men >= ag… stl    1948 Sep                   4.49  4.48    0.00362   0.00221 
10 men >= ag… stl    1948 Oct                   4.49  4.48    0.000450  0.00294 
# ℹ 1,840 more rows
# ℹ abbreviated name: ¹​`log(participation_rate)`
# ℹ 1 more variable: season_adjust <dbl>

In the following graphs, the season_year variable represents the seasonality. The grey bars on the left indicate the reverse magnitude of the effect of each STL component.

The decomposition for men shows that the magnitude of the seasonality shrinks over time, and the pattern in which months are peaks and troughs changes as well. Among men, the remainder seems to have a stronger effect than seasonality (seasonal_year), though that is probably skewed by the singularly large outlier value of the COVID-19 pandemic.

Code
stl_components |> 
  filter(str_starts(series, "men")) |> 
  autoplot()

The seasonality pattern for women changes drastically multiple times over the course of the time series. The early years of this series have many large remainder values (in absolute terms), which could indicate the change in job types that were available to women at the time.

Code
stl_components |> 
  filter(str_starts(series, "women")) |> 
  autoplot()

This shows the de-seasonalized trend among men and women:

Code
stl_components |> 
  ggplot(aes(date, trend, color = series)) +
  geom_line(lwd = 1.5) +
  scale_color_manual(values = series_pal) +
  labs(title = "Labor Force Participation Rate",
       x = "Date",
       y = "LFPR trend") +
  theme(legend.position = "bottom")

The LFPR trend for women actually increased at the height of the 2008 financial crisis. This could indicate that some women were pulled into the labor force in response to increased economic pressure on them and/or their household.

Code
stl_components |> 
  filter(between(year(date), 2005, 2010)) |> 
  ggplot(aes(date, trend, color = series)) +
  geom_line(lwd = 1.5) +
  scale_color_manual(values = series_pal) +
  facet_wrap(vars(series), scales = "free_y", ncol = 1) +
  labs(title = "LFPR during 2008 financial crisis",
       x = "Date",
       y = "LFPR trend",
       color = "Series") +
  theme(legend.position = "bottom")

Features

This code uses the {feasts} package to calculate summary statistics about each time series. I will focus on the strength of the trend and seasonality, and the amount of noise in each series.

Code
fred_features <- fred_df |>
  features(participation_rate, feature_set(pkgs = "feasts"))

fred_features |> 
  select(series, trend_strength, seasonal_strength_year, spectral_entropy)
# A tibble: 2 × 4
  series          trend_strength seasonal_strength_year spectral_entropy
  <chr>                    <dbl>                  <dbl>            <dbl>
1 men >= age 20            0.999                  0.841           0.0457
2 women >= age 20          1.00                   0.787           0.0423

Both series have very strong trends, but the series for men has stronger seasonality (seasonal_strength_year). This confirms my impressions from the graphs above. Both series have very similar low spectral_entropy values which shows that there is very little noise in the data. I would expect both series to be easily forecastable in the near term.

Peaks and troughs

Another way to summarize a time series is to calculate the average peak and trough of the seasonality.

Code
month_lookup <- tibble(month_int = c(1:12),
                       month = month.abb)

peak_trough <- fred_features |> 
  select(series, seasonal_peak_year, seasonal_trough_year) |> 
  left_join(month_lookup, by = c("seasonal_peak_year" = "month_int")) |> 
  select(-seasonal_peak_year) |> 
  rename(seasonal_peak_year = month) |> 
  left_join(month_lookup, by = c("seasonal_trough_year" = "month_int")) |> 
  select(-seasonal_trough_year) |> 
  rename(seasonal_trough_year = month)

peak_trough
# A tibble: 2 × 3
  series          seasonal_peak_year seasonal_trough_year
  <chr>           <chr>              <chr>               
1 men >= age 20   Jul                Jan                 
2 women >= age 20 Oct                Jul                 

In men, the peak is July and the trough is in January. In women, the peak is in October and the trough is in July. These metrics are calculated over the entire series, so it does not reflect changes over time.