---
title: "STA 9750 Mini-Project #03: Who Goes There?"
subtitle: "US Internal Migration and Implications for Congressional Reapportionment in Texas"
author: "Raúl J. Solá Navarro"
date: today
format:
html:
toc: true
toc-depth: 3
toc-float: true
code-fold: true
code-tools: true
theme: cosmo
fig-width: 10
fig-height: 6
fig-dpi: 150
embed-resources: true
execute:
warning: false
message: false
cache: false
---
# Introduction
Between 2020 and 2024, Texas added more people than any other state in the nation. The Sun Belt surge that has reshaped American politics over the past two decades has made Texas the center of the country's demographic transformation. As the "Lone Star" gains congressional clout, its northern counterparts watch seats slip away. But what does the data actually say? Where are Texans coming from, where are they going, and can a governor with a targeted advertising budget shift the numbers ahead of the 2030 census?
This report uses American Community Survey (ACS) migration state-to-state and metro-to-metro flow data to quantify those patterns, build population forecasts through 2030, and estimate Texas's likely congressional apportionment for the 2032 elections. We close with a data-driven advertising strategy designed to maximize the state's political footprint.
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
library(tidycensus)
library(readxl)
library(jsonlite)
library(fs)
library(vctrs)
library(gt)
library(gtExtras)
library(scales)
library(glue)
library(sf)
library(tigris)
library(ggraph)
library(igraph)
library(gganimate)
library(patchwork)
library(colorspace)
library(viridis)
options(tigris_use_cache = TRUE)
options(scipen = 999)
```
```{r}
#| label: cache-infrastructure
#| include: false
DATA_DIR <- fs::path("data", "mp03")
if (!fs::dir_exists(DATA_DIR)) fs::dir_create(DATA_DIR, recurse = TRUE)
#' Download a file only if it doesn't already exist locally.
#'
#' @param url Remote URL to download from.
#' @param dest Local destination path (inside DATA_DIR).
#' @param method Download method - defaults to "wininet" for Windows SSL compat.
#' @param ... Additional arguments forwarded to download.file().
#' @return Invisibly returns the destination path.
cached_download <- function(url, dest, method = "wininet", ...) {
if (!fs::file_exists(dest)) {
message(glue("Downloading: {fs::path_file(dest)}"))
download.file(url, destfile = dest, mode = "wb",
quiet = TRUE, method = method, ...)
} else {
message(glue("Using cached: {fs::path_file(dest)}"))
}
invisible(dest)
}
```
```{r}
#| label: custom-theme
#| include: false
texas_palette <- c(
blue = "#002868",
red = "#BF0A30",
sand = "#D4A853",
teal = "#2E8B8B",
grey = "#6B6B6B",
lgrey = "#E8E8E8"
)
theme_tx <- function(base_size = 12) {
theme_minimal(base_size = base_size) +
theme(
plot.title = element_text(face = "bold", size = base_size + 3,
colour = texas_palette["blue"],
margin = margin(b = 6)),
plot.subtitle = element_text(size = base_size,
colour = texas_palette["grey"],
margin = margin(b = 10)),
plot.caption = element_text(size = base_size - 2,
colour = texas_palette["grey"],
hjust = 0),
axis.title = element_text(size = base_size - 1,
colour = texas_palette["grey"]),
axis.text = element_text(size = base_size - 2),
panel.grid.major = element_line(colour = texas_palette["lgrey"]),
panel.grid.minor = element_blank(),
legend.position = "bottom",
legend.title = element_text(face = "bold", size = base_size - 1),
strip.text = element_text(face = "bold", size = base_size - 1,
colour = texas_palette["blue"])
)
}
theme_set(theme_tx())
```
> **Reproducibility note:** All external files are cached in `data/mp03/`. If
> you are running this document for the first time, the files will be downloaded
> automatically. Subsequent renders use the cached copies and require no
> internet access.
# State Population Data
Texas has been the fastest-growing large state in the nation over the past decade. The table below ranks states by their current 2024 population. Texas sits firmly in second place at over 31 million residents, trailing only California. For growth rates and absolute population added since 2015, see the analysis in Q1 of the Exploratory Data Analysis section.
```{r}
#| label: task1-state-population
#| cache: true
#| message: false
#| warning: false
#| results: hide
# B01003_001 = Total Population
# Note: 2020 ACS-1 was never released due to COVID-19 low response rates.
pop_raw <- map(
c(2015:2019, 2021:2024),
\(yr) get_acs(
geography = "state",
variables = c(total_population = "B01003_001"),
year = yr,
survey = "acs1",
geometry = TRUE,
cache_table = TRUE
) |>
mutate(year = yr)
) |>
list_rbind()
state_pop <- pop_raw |>
tigris::shift_geometry() |>
filter(!NAME %in% c("District of Columbia", "Puerto Rico")) |>
transmute(
geoid = GEOID,
state_name = NAME,
state_abbr = state.abb[match(NAME, state.name)],
year = year,
total_population = estimate,
pop_moe = moe,
geometry = geometry
)
```
```{r}
#| label: task1-population-table
state_pop_2024 <- state_pop |>
filter(year == 2024) |>
arrange(desc(total_population))
bind_rows(
slice_head(state_pop_2024, n = 10),
slice_tail(state_pop_2024, n = 10)
) |>
st_drop_geometry() |>
mutate(rank = c(1:10, 41:50)) |>
select(rank, state_name, state_abbr, total_population) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**State Populations, 2024 ACS-1 Estimates**"),
subtitle = "Largest and smallest states by total population"
) |>
cols_label(
state_name = "State",
state_abbr = "",
total_population = "Population"
) |>
fmt_integer(total_population) |>
tab_row_group(label = "10 Smallest States", rows = 11:20) |>
tab_row_group(label = "10 Largest States", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state_abbr == "TX")
) |>
tab_footnote(
footnote = "Texas highlighted in blue.",
locations = cells_column_labels(state_name)
) |>
tab_source_note(
"Source: US Census Bureau, ACS 1-Year Estimates (2024), Table B01003."
)
```
The chart below tracks cumulative population growth since 2015 for Texas and four peer states, indexed to 100 at the start of the period. The 2020 gap reflects the suspension of ACS-1 data collection due to COVID-19 since no ACS-1 was released for that year. Texas and Florida's trajectories clearly separate them from the pack around 2021.
```{r}
#| label: task1-population-growth-plot
#| fig-cap: "Texas population growth versus the national average and selected
#| comparison states, 2015-2024. The 2020 gap reflects the suspension of
#| ACS-1 data collection during COVID-19. Texas outpaced all comparison
#| states over the full period."
focus_states <- c("Texas", "California", "Florida", "New York", "Arizona")
pop_indexed <- state_pop |>
st_drop_geometry() |>
filter(state_name %in% focus_states) |>
group_by(state_name) |>
mutate(
pop_2015 = total_population[year == 2015],
index = 100 * total_population / pop_2015
) |>
ungroup()
us_total <- state_pop |>
st_drop_geometry() |>
group_by(year) |>
summarise(total_population = sum(total_population), .groups = "drop") |>
mutate(
state_name = "US Total",
pop_2015 = total_population[year == 2015],
index = 100 * total_population / pop_2015
)
state_colors <- c(
"Texas" = "#002868",
"Florida" = "#BF0A30",
"Arizona" = "#D4A853",
"California" = "#2E8B8B",
"New York" = "#6B6B6B",
"US Total" = "black"
)
state_lty <- c(
"Texas" = "solid",
"Florida" = "solid",
"Arizona" = "solid",
"California" = "solid",
"New York" = "solid",
"US Total" = "dashed"
)
bind_rows(pop_indexed, us_total) |>
ggplot(aes(x = year, y = index,
colour = state_name, linetype = state_name)) +
geom_line(linewidth = 1.1) +
geom_point(size = 2) +
geom_text(
data = \(d) d |> filter(year == 2024),
aes(label = glue("{state_name}\n{round(index, 1)}")),
hjust = -0.08, size = 3, show.legend = FALSE
) +
scale_colour_manual(values = state_colors) +
scale_linetype_manual(values = state_lty) +
scale_x_continuous(
breaks = 2015:2024,
limits = c(2015, 2026.5)
) +
scale_y_continuous(
labels = \(x) paste0(x - 100, "%"),
breaks = seq(95, 125, 5)
) +
labs(
title = "Population Growth Since 2015 (Indexed to 100)",
subtitle = "Texas leads all major states and the national average",
x = NULL,
y = "Cumulative Growth Since 2015",
colour = NULL,
linetype = NULL,
caption = "Source: ACS 1-Year Estimates, Tables B01003 (2015-2024)."
) +
theme_tx() +
theme(legend.position = "none")
```
# State-to-State Migration Flows
To understand where Texans are coming from, and where they're going, we turn
to the Census Bureau's state-to-state migration flow files, derived from ACS-1
respondents who reported living in a different state one year prior. We parse
the 2023 and 2024 flow files separately (they have different formats) and
combine them into a single tidy table.
```{r}
#| label: task2-migration-2024
#| cache: true
#| include: false
state_to_state_migration_2024 <- function() {
dest <- fs::path(DATA_DIR, "State_to_State_Migration_Table_2024.xlsx")
raw <- read_excel(
dest,
sheet = "Table",
col_names = c("state_current", "state_1y", "population", "moe"),
skip = 8
)
flows <- raw |>
filter(
state_current %in% state.name,
state_1y %in% state.name
) |>
mutate(
population = as.integer(replace(population,
population %in% c("N", "X"), "0")),
year_current = 2024L,
year_1y = 2023L
) |>
filter(state_current != state_1y) |>
select(state_current, state_1y, population, year_current, year_1y)
suppl_raw <- read_excel(
dest,
sheet = "Supplemental - Current Res",
skip = 4,
col_names = FALSE
)
stationary <- suppl_raw |>
filter(!is.na(...1), ...1 %in% state.name) |>
select(state_current = ...1, same_house = ...4, within_state = ...6) |>
mutate(
population = as.integer(same_house) + as.integer(within_state),
state_1y = state_current,
year_current = 2024L,
year_1y = 2023L
) |>
select(state_current, state_1y, population, year_current, year_1y)
bind_rows(flows, stationary)
}
migration_2024 <- state_to_state_migration_2024()
stopifnot(nrow(migration_2024) == 2500)
```
```{r}
#| label: task3-migration-2023
#| cache: true
#| include: false
state_to_state_migration_2023 <- function() {
dest <- fs::path(DATA_DIR, "State_to_State_Migration_Table_2023.xlsx")
states_with_dc <- c(
state.name[1:8],
"District of Columbia",
state.name[9:50]
)
left_raw <- read_excel(
dest,
range = "A7:F62",
col_names = c("state_current", "total", "same_house",
"moe1", "same_county", "moe2")
)
stationary <- left_raw |>
dplyr::filter(!is.na(state_current), state_current %in% state.name) |>
dplyr::mutate(
population = as.integer(same_house) + as.integer(same_county),
state_1y = state_current,
year_current = 2023L,
year_1y = 2022L
) |>
dplyr::select(state_current, state_1y, population, year_current, year_1y)
flows_raw <- read_excel(
dest,
range = "J7:DG62",
col_names = vctrs::vec_interleave(
states_with_dc,
paste0(states_with_dc, "_moe")
)
)
flows <- flows_raw |>
dplyr::mutate(state_1y = left_raw$state_current) |>
dplyr::filter(!is.na(state_1y), state_1y %in% state.name) |>
dplyr::select(state_1y, dplyr::any_of(states_with_dc)) |>
dplyr::select(-dplyr::any_of("District of Columbia")) |>
tidyr::pivot_longer(
cols = -state_1y,
names_to = "state_current",
values_to = "population"
) |>
dplyr::filter(
state_current %in% state.name,
state_1y != state_current
) |>
dplyr::mutate(
population = as.integer(population),
year_current = 2023L,
year_1y = 2022L
) |>
dplyr::select(state_current, state_1y, population, year_current, year_1y)
dplyr::bind_rows(stationary, flows)
}
migration_2023 <- state_to_state_migration_2023()
stopifnot(nrow(migration_2023) == 2500)
```
```{r}
#| label: task4-migration-combined
#| cache: true
#| include: false
migration_flows <- bind_rows(
state_to_state_migration_2024(),
state_to_state_migration_2023()
)
stopifnot(nrow(migration_flows) == 5000)
```
The table below highlights a striking pattern among the four largest states: California is simultaneously Texas's largest source of in-migrants (77,161 people) and the top destination for Texans leaving the state (45,447 people). This two-way flow defines the largest migration corridor in the country.
```{r}
#| label: task4-migration-preview-table
migration_flows |>
filter(
state_current %in% c("Texas", "California", "New York", "Florida"),
state_1y %in% c("Texas", "California", "New York", "Florida"),
year_current == 2024
) |>
arrange(state_current, state_1y) |>
mutate(
label = if_else(
state_current == state_1y,
glue("Stayed in {state_current}"),
glue("{state_1y} → {state_current}")
)
) |>
select(label, population, year_current, year_1y) |>
gt() |>
tab_header(
title = md("**State-to-State Migration Flows, 2024**"),
subtitle = "Selected flows among four largest states"
) |>
cols_label(
label = "Flow",
population = "People",
year_current = "Survey Year",
year_1y = "Prior Year"
) |>
fmt_integer(population) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = grepl("Texas", label))
) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(
rows = grepl("Stayed", label),
columns = label
)
) |>
tab_source_note(
"Source: US Census Bureau, ACS 1-Year State-to-State Migration Flow Tables."
)
```
# Metro-Area Migration Flows
State-level data tells us *where* people are moving in aggregate, but
metro-area flows reveal the specific urban corridors driving Texas's growth.
The Houston, Dallas, and Austin metros are among the fastest-growing in the
country, but which cities are feeding them? We use the Census Bureau's
ACS-5 metro-to-metro migration API to find out.
```{r}
#| label: task5-metro-migration
#| cache: true
#| include: false
metro_to_metro_migration <- function() {
dest <- fs::path(DATA_DIR, "metro_migration.json")
cached_download(
url = paste0(
"https://api.census.gov/data/2020/acs/flows",
"?get=GEOID1,GEOID2,MOVEDIN,MOVEDOUT,FULL1_NAME,FULL2_NAME",
"&for=metropolitan%20statistical%20area/",
"micropolitan%20statistical%20area:*"
),
dest = dest
)
move_row_to_colnames <- function(X, row = 1) {
X <- as_tibble(X)
Xrow <- X[row, ]
X <- X[-row, ]
colnames(X) <- Xrow
suppressMessages(readr::type_convert(X))
}
raw <- jsonlite::read_json(dest, simplifyVector = TRUE)
metro_flows <- raw |>
move_row_to_colnames() |>
rename(
moved_in = MOVEDIN,
moved_out = MOVEDOUT,
metro1_name = FULL1_NAME,
metro2_name = FULL2_NAME
) |>
select(metro1_name, metro2_name, moved_in, moved_out) |>
mutate(
metro1_state = str_extract(
metro1_name,
".*, (\\S{2})[-[:alpha:]]* Metro Area",
group = 1
),
metro2_state = str_extract(
metro2_name,
".*, (\\S{2})[-[:alpha:]]* Metro Area",
group = 1
)
)
metro_flows
}
metro_flows <- metro_to_metro_migration()
```
The Los Angeles metro sends more people to Texas than any other single metro area with over 27,000 per year during the 2016-2020 period. New York and Chicago follow, confirming that Texas's growth is largely powered by residents fleeing the high costs and congestion of the country's three largest legacy metros.
```{r}
#| label: task5-metro-preview-table
metro_flows |>
filter(
metro1_state == "TX",
!is.na(metro2_state),
metro2_state != "TX"
) |>
group_by(metro2_name) |>
summarise(
total_moved_in = sum(moved_in, na.rm = TRUE),
.groups = "drop"
) |>
slice_max(total_moved_in, n = 10) |>
mutate(rank = row_number()) |>
select(rank, metro2_name, total_moved_in) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Top 10 Metro Sources of In-Migration to Texas**"),
subtitle = "Non-Texas metros sending the most residents to Texas, ACS 5-Year 2016-2020"
) |>
cols_label(
metro2_name = "Origin Metro Area",
total_moved_in = "People Moved In"
) |>
fmt_integer(total_moved_in) |>
data_color(
columns = total_moved_in,
palette = c("#EEF2FF", "#002868")
) |>
tab_source_note(
"Source: US Census Bureau, ACS 5-Year Migration Flows API (2016-2020)."
)
```
# Exploratory Data Analysis
The following analyses address nine key questions about US internal migration
patterns, with particular attention to Texas and the Houston metro area.
## Q1: Highest State Population Growth Rates
Which states have grown the fastest over the past decade? We measure growth
as the percentage change in total population from 2015 to 2024. Idaho and
Utah lead on a percentage basis, but Texas leads all states in absolute
population added with nearly 3.8 million people over the period.
```{r}
#| label: eda-q1-growth-rates
growth_rates <- state_pop |>
st_drop_geometry() |>
filter(year %in% c(2015, 2024)) |>
select(state_name, state_abbr, year, total_population) |>
pivot_wider(names_from = year, values_from = total_population,
names_prefix = "pop_") |>
mutate(
growth_pct = 100 * (pop_2024 - pop_2015) / pop_2015,
pop_added = pop_2024 - pop_2015
) |>
arrange(desc(growth_pct))
growth_rates |>
slice(c(1:10, 41:50)) |>
mutate(rank = c(1:10, 41:50)) |>
select(rank, state_name, state_abbr, pop_2015, pop_2024,
growth_pct, pop_added) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**State Population Growth Rates, 2015-2024**"),
subtitle = "Fastest and slowest growing states by percentage change"
) |>
cols_label(
state_name = "State",
state_abbr = "",
pop_2015 = "2015 Population",
pop_2024 = "2024 Population",
growth_pct = "Growth Rate",
pop_added = "People Added"
) |>
fmt_integer(c(pop_2015, pop_2024, pop_added)) |>
fmt_number(growth_pct, decimals = 1, suffix = "%") |>
tab_row_group(label = "10 Slowest Growing States", rows = 11:20) |>
tab_row_group(label = "10 Fastest Growing States", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state_abbr == "TX")
) |>
data_color(
columns = growth_pct,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: ACS 1-Year Estimates, Table B01003 (2015, 2024)."
)
```
## Q2: Migration To and From New York State
New York's migration picture is clear: it sends far more people to other states than it receives. Florida and Texas are the top two destinations for departing New Yorkers which directly benefits Texas's congressional apportionment at New York's expense.
```{r}
#| label: eda-q2-new-york-state
ny_in <- migration_flows |>
filter(state_current == "New York",
state_1y != "New York",
year_current == 2024) |>
arrange(desc(population)) |>
slice_head(n = 10) |>
mutate(direction = "Into New York")
ny_out <- migration_flows |>
filter(state_1y == "New York",
state_current != "New York",
year_current == 2024) |>
arrange(desc(population)) |>
slice_head(n = 10) |>
mutate(direction = "Out of New York")
bind_rows(
ny_in |> select(direction, state = state_1y, population),
ny_out |> select(direction, state = state_current, population)
) |>
group_by(direction) |>
mutate(rank = row_number()) |>
ungroup() |>
pivot_wider(names_from = direction,
values_from = c(state, population),
names_sep = "_") |>
select(rank,
`state_Into New York`, `population_Into New York`,
`state_Out of New York`, `population_Out of New York`) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**New York State Migration Flows, 2024**"),
subtitle = "Top 10 origin states for in-migrants and destination states for out-migrants"
) |>
cols_label(
`state_Into New York` = "Origin State",
`population_Into New York` = "People",
`state_Out of New York` = "Destination State",
`population_Out of New York` = "People"
) |>
fmt_integer(c(`population_Into New York`, `population_Out of New York`)) |>
tab_spanner(label = "Moving INTO New York",
columns = c(`state_Into New York`, `population_Into New York`)) |>
tab_spanner(label = "Moving OUT of New York",
columns = c(`state_Out of New York`, `population_Out of New York`)) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_spanners()
) |>
tab_source_note(
"Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
)
```
## Q3: Migration To and From New York City Metro
At the metro level, NYC's largest domestic migration partner is a relatively short-distance neighbor: Philadelphia. More consequential for Texas is the strong flow from NYC to Miami and Los Angeles, as the data show NYC residents dispersing to desirable Sun Belt cities rather than directly to Texas. Nonetheless, New York City was the 4th-largest source of in-migrants to the Houston metro area specifically.
```{r}
#| label: eda-q3-nyc-metro
nyc_pattern <- "New York-Newark-Jersey City"
# Top sources INTO NYC - US metros only, excluding non-metro categories
nyc_in <- metro_flows |>
filter(
str_detect(metro1_name, nyc_pattern),
!str_detect(metro2_name, nyc_pattern),
str_detect(metro2_name, "Metro Area"),
!str_detect(metro2_name, "Outside Metro Area"),
!is.na(moved_in),
moved_in > 0
) |>
arrange(desc(moved_in)) |>
slice_head(n = 10) |>
select(metro = metro2_name, people = moved_in)
# Top destinations OUT of NYC - US metros only
nyc_out <- metro_flows |>
filter(
str_detect(metro1_name, nyc_pattern),
!str_detect(metro2_name, nyc_pattern),
str_detect(metro2_name, "Metro Area"),
!str_detect(metro2_name, "Outside Metro Area"),
!is.na(moved_out),
moved_out > 0
) |>
arrange(desc(moved_out)) |>
slice_head(n = 10) |>
select(metro = metro2_name, people = moved_out)
bind_cols(
nyc_in |> mutate(rank = row_number()) |>
select(rank, in_metro = metro, in_people = people),
nyc_out |>
select(out_metro = metro, out_people = people)
) |>
mutate(
in_metro = str_remove(in_metro, " Metro Area$"),
out_metro = str_remove(out_metro, " Metro Area$")
) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**New York City Metro Migration Flows**"),
subtitle = "Top 10 origin and destination metros, ACS 5-Year 2016-2020"
) |>
cols_label(
in_metro = "Origin Metro",
in_people = "People",
out_metro = "Destination Metro",
out_people = "People"
) |>
fmt_integer(c(in_people, out_people)) |>
tab_spanner(label = "Moving INTO NYC",
columns = c(in_metro, in_people)) |>
tab_spanner(label = "Moving OUT of NYC",
columns = c(out_metro, out_people)) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_spanners()
) |>
tab_source_note(
"Source: ACS 5-Year Metro-to-Metro Migration Flows API (2016-2020)."
)
```
## Q4: States With Highest In-, Out-, and Net Migration
Texas leads all states in net domestic in-migration with a surplus of nearly 72,000 people in 2024. At the other extreme, California experienced the largest domestic out-migration drain in the country, losing a net 252,000 residents to other states. Florida's total flows in both directions reflect its role as both a major destination and a pass-through state.
```{r}
#| label: eda-q4-state-net-migration
state_migration_totals <- migration_flows |>
filter(
state_current != state_1y,
year_current == 2024
) |>
group_by(state = state_current) |>
summarise(total_in = sum(population, na.rm = TRUE), .groups = "drop") |>
left_join(
migration_flows |>
filter(state_current != state_1y, year_current == 2024) |>
group_by(state = state_1y) |>
summarise(total_out = sum(population, na.rm = TRUE), .groups = "drop"),
by = "state"
) |>
mutate(net_migration = total_in - total_out) |>
arrange(desc(net_migration))
state_migration_totals |>
slice(c(1:10, 41:50)) |>
mutate(rank = c(1:10, 41:50)) |>
left_join(
state_pop |> st_drop_geometry() |>
filter(year == 2024) |>
select(state_name, state_abbr),
by = c("state" = "state_name")
) |>
select(rank, state, state_abbr, total_in, total_out, net_migration) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**State Migration Totals, 2024**"),
subtitle = "Highest and lowest net migration states"
) |>
cols_label(
state = "State",
state_abbr = "",
total_in = "In-Migration",
total_out = "Out-Migration",
net_migration = "Net Migration"
) |>
fmt_integer(c(total_in, total_out, net_migration)) |>
tab_row_group(label = "10 Highest Net Out-Migration", rows = 11:20) |>
tab_row_group(label = "10 Highest Net In-Migration", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state == "Texas")
) |>
data_color(
columns = net_migration,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
)
```
## Q5: Metro Areas With Highest In-, Out-, and Net Migration
At the metro level, Phoenix leads all US metros in net in-migration, followed by Riverside and Dallas-Fort Worth. Notably, two Texas metros, Dallas-Fort Worth and Austin, rank in the top four nationally. The bottom of the table is dominated by coastal legacy metros: New York, Los Angeles, Chicago, and San Francisco all rank among the largest net losers.
```{r}
#| label: eda-q5-metro-net-migration
metro_totals <- metro_flows |>
filter(!is.na(metro1_state), !is.na(metro2_state)) |>
group_by(metro = metro1_name, state = metro1_state) |>
summarise(
total_in = sum(moved_in, na.rm = TRUE),
total_out = sum(moved_out, na.rm = TRUE),
.groups = "drop"
) |>
mutate(
net_migration = total_in - total_out,
metro_short = str_remove(metro, " Metro Area$")
) |>
arrange(desc(net_migration))
metro_totals |>
slice(c(1:10, (nrow(metro_totals) - 9):nrow(metro_totals))) |>
mutate(rank = c(1:10, (nrow(metro_totals) - 9):nrow(metro_totals))) |>
select(rank, metro_short, state, total_in, total_out, net_migration) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Metro Area Migration Totals**"),
subtitle = "Highest and lowest net migration metros, ACS 5-Year 2016-2020"
) |>
cols_label(
metro_short = "Metro Area",
state = "State",
total_in = "In-Migration",
total_out = "Out-Migration",
net_migration = "Net Migration"
) |>
fmt_integer(c(total_in, total_out, net_migration)) |>
tab_row_group(label = "10 Highest Net Out-Migration", rows = 11:20) |>
tab_row_group(label = "10 Highest Net In-Migration", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
data_color(
columns = net_migration,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: ACS 5-Year Metro-to-Metro Migration Flows API (2016-2020)."
)
```
## Q6: States With Lowest Proportional In-Migration
What are the "sticky" states where residents are least likely to have moved from elsewhere? California leads this ranking, with just 1.05% of its population having arrived from another state in 2024. Texas ranks 9th on this list despite its rapid growth, reflecting the fact that its enormous existing population base dilutes even large absolute in-migration flows.
```{r}
#| label: eda-q6-sticky-states
migration_flows |>
filter(year_current == 2024) |>
group_by(state = state_current) |>
summarise(
total_pop = sum(population, na.rm = TRUE),
moved_in = sum(population[state_current != state_1y], na.rm = TRUE),
.groups = "drop"
) |>
mutate(pct_moved_in = 100 * moved_in / total_pop) |>
arrange(pct_moved_in) |>
slice(c(1:10, 41:50)) |>
mutate(rank = c(1:10, 41:50)) |>
left_join(
state_pop |> st_drop_geometry() |>
filter(year == 2024) |>
select(state_name, state_abbr),
by = c("state" = "state_name")
) |>
select(rank, state, state_abbr, total_pop, moved_in, pct_moved_in) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**State Population Stickiness, 2024**"),
subtitle = "States ranked by fraction of residents who moved in from another state"
) |>
cols_label(
state = "State",
state_abbr = "",
total_pop = "Total Population",
moved_in = "Moved In From Another State",
pct_moved_in = "% In-Migrants"
) |>
fmt_integer(c(total_pop, moved_in)) |>
fmt_number(pct_moved_in, decimals = 2, suffix = "%") |>
tab_row_group(label = "10 Least Sticky (Most In-Migration)", rows = 11:20) |>
tab_row_group(label = "10 Most Sticky (Least In-Migration)", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state == "Texas")
) |>
data_color(
columns = pct_moved_in,
palette = c("#EEF2FF", "#002868")
) |>
tab_source_note(
"Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
)
```
## Q7: States Where Migration Drives the Most Growth
Surprisingly, Texas does not appear in the top 15 despite having the highest
absolute net in-migration of any state. This reflects the fact that Texas's
population growth is so large overall, driven by natural increase and
international immigration, that internal US migration accounts for a
relatively modest share of total growth. By contrast, states like Wyoming
and Vermont are growing slowly enough that even modest net migration
represents a large fraction of their total population change. Wyoming's
428% figure (bar capped at 100% for readability) illustrates this extreme
case: the state added only 1,511 residents total, so even a few hundred
net migrants tips the ratio dramatically.
```{r}
#| label: eda-q7-migration-driven-growth
migration_growth_share <- state_migration_totals |>
left_join(growth_rates, by = c("state" = "state_name")) |>
filter(!is.na(pop_added), pop_added > 0) |>
mutate(migration_share = 100 * net_migration / pop_added) |>
filter(!is.na(migration_share), is.finite(migration_share)) |>
arrange(desc(migration_share))
plot_data <- migration_growth_share |>
slice_max(migration_share, n = 15) |>
mutate(
state_label = glue("{state} ({state_abbr})"),
is_texas = state == "Texas",
migration_capped = pmin(migration_share, 100),
label_text = if_else(
migration_share > 100,
paste0(round(migration_share, 0), "%*"),
paste0(round(migration_share, 0), "%")
)
)
ggplot(plot_data,
aes(x = reorder(state_label, migration_share),
y = migration_capped,
fill = is_texas)) +
geom_col(width = 0.7) +
geom_text(
aes(label = label_text),
hjust = -0.1, size = 3.5
) +
scale_fill_manual(
values = c("FALSE" = "#6B6B6B", "TRUE" = "#002868"),
guide = "none"
) +
scale_y_continuous(
limits = c(0, 120),
labels = \(x) paste0(x, "%")
) +
coord_flip() +
labs(
title = "Share of Population Growth Attributable to Net In-Migration",
subtitle = "Top 15 states; bars capped at 100% (* = actual value exceeds cap)",
x = NULL,
y = "Net Migration as % of Total Population Growth",
caption = "Source: ACS 1-Year Migration Flows and Population Estimates (2024).\n* Wyoming actual value = 428%."
) +
theme_tx()
```
## Q8: Texas-Specific Migration Patterns
The more telling pattern for Texas's congressional position involves Oklahoma and Colorado. Unlike California, where the flow runs heavily in both directions, Texans are leaving for Oklahoma and Colorado at high rates without equivalent numbers coming back. This is a net drain worth targeting, and notably, Oklahoma leans politically closer to Texas than Colorado does, making it a particularly promising market for a retention campaign.
```{r}
#| label: eda-q8-texas-flows
tx_in <- migration_flows |>
filter(
state_current == "Texas",
state_1y != "Texas",
year_current == 2024
) |>
arrange(desc(population)) |>
slice_head(n = 10)
tx_out <- migration_flows |>
filter(
state_1y == "Texas",
state_current != "Texas",
year_current == 2024
) |>
arrange(desc(population)) |>
slice_head(n = 10)
bind_cols(
tx_in |> mutate(rank = row_number()) |>
select(rank, in_state = state_1y, in_pop = population),
tx_out |>
select(out_state = state_current, out_pop = population)
) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Texas Migration Flows, 2024**"),
subtitle = "Top 10 origin states for in-migrants and destinations for out-migrants"
) |>
cols_label(
in_state = "Origin State",
in_pop = "People Moving In",
out_state = "Destination State",
out_pop = "People Moving Out"
) |>
fmt_integer(c(in_pop, out_pop)) |>
tab_spanner(
label = "Moving INTO Texas",
columns = c(in_state, in_pop)
) |>
tab_spanner(
label = "Moving OUT of Texas",
columns = c(out_state, out_pop)
) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_spanners()
) |>
tab_source_note(
"Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
)
```
## Q9: Houston Metro Migration Patterns
Houston is Texas's largest metro area. Within Texas, Houston exchanges people heavily with Dallas-Fort Worth, Austin, and San Antonio. This reflects how Texans move between the state's major cities and economic centers. The connection-strength chart below reveals which metros are most disproportionately tied to Houston relative to their total out-migration.
```{r}
#| label: eda-q9-houston-flows
houston_pattern <- "Houston-The Woodlands-Sugar Land"
# Top sources INTO Houston - US metros only
houston_in <- metro_flows |>
filter(
str_detect(metro1_name, houston_pattern),
!str_detect(metro2_name, houston_pattern),
str_detect(metro2_name, "Metro Area"),
!str_detect(metro2_name, "Outside Metro Area"),
!is.na(moved_in),
moved_in > 0
) |>
arrange(desc(moved_in)) |>
slice_head(n = 10)
# Top destinations OUT of Houston - US metros only
houston_out <- metro_flows |>
filter(
str_detect(metro1_name, houston_pattern),
!str_detect(metro2_name, houston_pattern),
str_detect(metro2_name, "Metro Area"),
!str_detect(metro2_name, "Outside Metro Area"),
!is.na(moved_out),
moved_out > 0
) |>
arrange(desc(moved_out)) |>
slice_head(n = 10)
bind_cols(
houston_in |> mutate(rank = row_number()) |>
select(rank, in_metro = metro2_name, in_people = moved_in),
houston_out |>
select(out_metro = metro2_name, out_people = moved_out)
) |>
mutate(
in_metro = str_remove(in_metro, " Metro Area$"),
out_metro = str_remove(out_metro, " Metro Area$")
) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Houston Metro Migration Flows**"),
subtitle = "Top 10 origin and destination metros, ACS 5-Year 2016-2020"
) |>
cols_label(
in_metro = "Origin Metro",
in_people = "People",
out_metro = "Destination Metro",
out_people = "People"
) |>
fmt_integer(c(in_people, out_people)) |>
tab_spanner(
label = "Moving INTO Houston",
columns = c(in_metro, in_people)
) |>
tab_spanner(
label = "Moving OUT of Houston",
columns = c(out_metro, out_people)
) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_spanners()
) |>
tab_source_note(
"Source: ACS 5-Year Metro-to-Metro Migration Flows API (2016-2020)."
)
```
The chart below identified each metro's draw to Houston measured as the share of its total out-migration that flows specifically to Houston. Metros near the top of this list are the most fertile advertising targets: their residents are already choosing Houston at an unusually high rate, suggesting strong word-of-mouth networks and existing community ties.
```{r}
#| label: eda-q9-houston-connection-strength
#| fig-cap: "Metros with the strongest migration connection to Houston,
#| measured as Houston's share of total out-migration from each origin metro.
#| A high share indicates Houston is a uniquely popular destination."
metro_flows |>
filter(
str_detect(metro1_name, houston_pattern),
!str_detect(metro2_name, houston_pattern),
str_detect(metro2_name, "Metro Area"),
!str_detect(metro2_name, "Outside Metro Area"),
!is.na(moved_in),
moved_in > 0,
!is.na(metro2_state)
) |>
left_join(
metro_totals |> select(metro, total_out),
by = c("metro2_name" = "metro")
) |>
mutate(
houston_share = 100 * moved_in / total_out,
metro_short = str_remove(metro2_name, " Metro Area$")
) |>
filter(!is.na(houston_share), total_out > 5000) |>
slice_max(houston_share, n = 15) |>
ggplot(aes(x = reorder(metro_short, houston_share),
y = houston_share)) +
geom_col(fill = "#002868", width = 0.7) +
geom_text(
aes(label = paste0(round(houston_share, 1), "%")),
hjust = -0.1, size = 3
) +
scale_y_continuous(
limits = c(0, 15),
labels = \(x) paste0(x, "%"),
oob = scales::squish
) +
coord_flip() +
labs(
title = "Metros With Strongest Migration Connection to Houston",
subtitle = "Houston's share of each metro's total out-migration (metros with >5,000 total out-migrants)",
x = NULL,
y = "Houston's Share of Origin Metro's Out-Migration",
caption = "Source: ACS 5-Year Metro-to-Metro Migration Flows (2016-2020)."
) +
theme_tx()
```
# Population Projections to 2030
To estimate Texas's congressional apportionment after the 2030 census, we
first need to forecast each state's 2030 population. We use the migration
model described in the assignment^[The population
projection model and formula were specified by the course instructor as
part of the Mini-Project #03 assignment. Our contribution is in fitting
the parameters (γ and λ) to the ACS data, implementing the projection
in R (both with the use of AI), and interpreting the results.], which assumes population growth follows:
$$P_{i,t+1} = P_{i,t} \cdot (1 + \gamma) + \sum_{j \neq i} \lambda_{ij} \cdot P_{i,t} \cdot P_{j,t}$$
where $\gamma$ is a national natural growth rate and $\lambda_{ij}$ captures
the migration flow rate from state $j$ to state $i$. We fit parameters using
both the 2023 and 2024 flow files and average the lambda values for stability.
Because the 2023 file format does not fully capture stationary population,
gamma (the natural growth rate) is estimated from 2024 data only, yielding
γ = −0.27% per year. This slightly negative value reflects the fact that the
ACS domestic flow tables do not capture international immigration, which has
been a substantial driver of US population growth. As a result, our projections
should be interpreted as conservative estimates on the low end, particularly for high-immigration states like Texas.
```{r}
#| label: task7-fit-parameters
#| cache: true
#| include: false
pop_2024 <- state_pop |> filter(year == 2024)
pop_2023 <- state_pop |> filter(year == 2023)
pop_2022 <- state_pop |> filter(year == 2022)
flows_2024 <- migration_flows |> filter(year_current == 2024)
flows_2023 <- migration_flows |> filter(year_current == 2023)
# ── National gamma (baseline model) ──────────────────────────────────────────
n_2024 <- sum(pop_2024$total_population, na.rm = TRUE)
n_2023 <- sum(pop_2023$total_population, na.rm = TRUE)
total_accounted_2024 <- flows_2024 |>
pull(population) |>
sum(na.rm = TRUE)
intl_in_2024 <- n_2024 - total_accounted_2024
gamma_avg <- (n_2024 - n_2023 - intl_in_2024) / n_2023
# ── EC#03: State-specific gamma ───────────────────────────────────────────────
# For each state i, gamma_i = (P_i_2024 - P_i_2023 - intl_in_i) / P_i_2023
# where intl_in_i = P_i_2024 - total_accounted_i (all flows into state i)
state_gamma <- flows_2024 |>
group_by(state_current) |>
summarise(
total_accounted = sum(population, na.rm = TRUE),
.groups = "drop"
) |>
left_join(
pop_2024 |> st_drop_geometry() |>
select(state_name, pop_2024 = total_population),
by = c("state_current" = "state_name")
) |>
left_join(
pop_2023 |> st_drop_geometry() |>
select(state_name, pop_2023 = total_population),
by = c("state_current" = "state_name")
) |>
mutate(
intl_in = pop_2024 - total_accounted,
gamma_i = (pop_2024 - pop_2023 - intl_in) / pop_2023
) |>
select(state_name = state_current, gamma_i, pop_2023, pop_2024, intl_in)
# ── Lambda: average over both years ──────────────────────────────────────────
compute_lambda <- function(flows_year, pop_current, pop_prior) {
flows_year |>
filter(state_current != state_1y) |>
left_join(
pop_current |> st_drop_geometry() |>
select(state_name, pop_i = total_population),
by = c("state_current" = "state_name")
) |>
left_join(
pop_prior |> st_drop_geometry() |>
select(state_name, pop_j = total_population),
by = c("state_1y" = "state_name")
) |>
filter(!is.na(pop_i), !is.na(pop_j)) |>
mutate(lambda = population / (pop_i * pop_j)) |>
select(state_current, state_1y, lambda, year_current, year_1y)
}
lambda_avg <- bind_rows(
compute_lambda(flows_2024, pop_2024, pop_2023),
compute_lambda(flows_2023, pop_2023, pop_2022)
) |>
group_by(state_current, state_1y) |>
summarise(lambda = mean(lambda, na.rm = TRUE), .groups = "drop")
```
```{r}
#| label: task7-project-populations
#| cache: true
#| include: false
# Project using state-specific gamma_i where available,
# falling back to national gamma_avg otherwise
project_one_year_ec03 <- function(pop_current, state_gamma_df,
gamma_national, lambda) {
# Natural growth using state-specific gamma
natural_growth <- pop_current |>
left_join(
state_gamma_df |> select(state_name, gamma_i),
by = "state_name"
) |>
mutate(
gamma_used = if_else(is.na(gamma_i), gamma_national, gamma_i),
natural = total_population * (1 + gamma_used)
)
in_migration <- lambda |>
left_join(
pop_current |> select(state_name, pop_i = total_population),
by = c("state_current" = "state_name")
) |>
left_join(
pop_current |> select(state_name, pop_j = total_population),
by = c("state_1y" = "state_name")
) |>
mutate(flow = lambda * pop_i * pop_j) |>
group_by(state_current) |>
summarise(total_in = sum(flow, na.rm = TRUE), .groups = "drop")
out_migration <- lambda |>
left_join(
pop_current |> select(state_name, pop_i = total_population),
by = c("state_current" = "state_name")
) |>
left_join(
pop_current |> select(state_name, pop_j = total_population),
by = c("state_1y" = "state_name")
) |>
mutate(flow = lambda * pop_i * pop_j) |>
group_by(state_1y) |>
summarise(total_out = sum(flow, na.rm = TRUE), .groups = "drop")
natural_growth |>
left_join(in_migration, by = c("state_name" = "state_current")) |>
left_join(out_migration, by = c("state_name" = "state_1y")) |>
mutate(
total_in = replace_na(total_in, 0),
total_out = replace_na(total_out, 0),
total_population = natural + total_in - total_out
) |>
select(state_name, state_abbr, total_population)
}
current_pop <- pop_2024 |>
st_drop_geometry() |>
select(state_name, state_abbr, total_population)
projections <- list()
projections_ec <- list()
for (yr in 2025:2030) {
# Baseline: national gamma
current_pop <- project_one_year_ec03(
current_pop,
state_gamma |> mutate(gamma_i = NA), # force national gamma
gamma_avg,
lambda_avg
)
projections[[as.character(yr)]] <- current_pop |> mutate(year = yr)
# EC#03: state-specific gamma (separate run)
}
# EC#03 separate run with state gammas
current_pop_ec <- pop_2024 |>
st_drop_geometry() |>
select(state_name, state_abbr, total_population)
for (yr in 2025:2030) {
current_pop_ec <- project_one_year_ec03(
current_pop_ec, state_gamma, gamma_avg, lambda_avg
)
projections_ec[[as.character(yr)]] <- current_pop_ec |> mutate(year = yr)
}
pop_projections <- bind_rows(projections)
pop_projections_ec <- bind_rows(projections_ec)
pop_all <- bind_rows(
state_pop |>
st_drop_geometry() |>
select(state_name, state_abbr, total_population, year) |>
mutate(type = "Historical"),
pop_projections |>
mutate(type = "Projected")
)
```
The table below shows projected 2030 populations for the 10 largest and 10 smallest states. Under this conservative model, most states show modest population declines driven by the negative gamma and reminds us that these projections exclude international immigration and should be read as a domestic-migration-only baseline.
```{r}
#| label: task7-projection-table
pop_2030 <- pop_projections |>
filter(year == 2030) |>
arrange(desc(total_population))
pop_2030 |>
slice(c(1:10, 41:50)) |>
mutate(rank = c(1:10, 41:50)) |>
left_join(
pop_2024 |>
st_drop_geometry() |>
select(state_name, pop_2024 = total_population),
by = "state_name"
) |>
mutate(projected_growth = total_population - pop_2024) |>
select(rank, state_name, state_abbr,
pop_2024, total_population, projected_growth) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Projected State Populations, 2030**"),
subtitle = "Based on averaged 2023-2024 migration parameters"
) |>
cols_label(
state_name = "State",
state_abbr = "",
pop_2024 = "2024 Population",
total_population = "Projected 2030 Population",
projected_growth = "Projected Growth"
) |>
fmt_integer(c(pop_2024, total_population, projected_growth)) |>
tab_row_group(label = "10 Smallest Projected States", rows = 11:20) |>
tab_row_group(label = "10 Largest Projected States", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state_abbr == "TX")
) |>
data_color(
columns = projected_growth,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: ACS 1-Year Estimates + author projections using averaged 2023-2024 migration parameters."
)
```
The chart below places these projections in historical context. Dashed lines indicate projected values from 2025-2030. Texas and Florida show the flattest projected trajectories among major states, while New York and California continue their slow decline under the domestic-only model.
```{r}
#| label: task7-projection-plot
#| fig-cap: "Historical and projected populations for Texas and four comparison
#| states, 2015-2030. Shaded region indicates projected years. Texas is
#| forecast to approach 34 million residents by 2030."
focus_states <- c("Texas", "California", "Florida", "New York", "Arizona")
pop_all |>
filter(state_name %in% focus_states) |>
ggplot(aes(x = year, y = total_population / 1e6,
colour = state_name, linetype = type)) +
geom_rect(
xmin = 2024.5, xmax = 2030.5,
ymin = -Inf, ymax = Inf,
fill = "#F5F5F5", colour = NA, alpha = 0.05
) +
geom_line(linewidth = 1.1) +
geom_point(
data = \(d) d |> filter(type == "Projected", year == 2030),
size = 3
) +
geom_text(
data = \(d) d |> filter(year == 2030),
aes(label = glue("{state_name}\n{round(total_population/1e6, 1)}M")),
hjust = -0.1, size = 3, show.legend = FALSE
) +
scale_colour_manual(values = c(
"Texas" = "#002868",
"Florida" = "#BF0A30",
"Arizona" = "#D4A853",
"California" = "#2E8B8B",
"New York" = "#6B6B6B"
)) +
scale_linetype_manual(
values = c("Historical" = "solid", "Projected" = "dashed")
) +
scale_x_continuous(
breaks = c(2015:2019, 2021:2030),
limits = c(2015, 2032)
) +
scale_y_continuous(labels = \(x) paste0(x, "M")) +
annotate("text", x = 2027, y = 42,
label = "← Projected", colour = "#6B6B6B", size = 3) +
labs(
title = "State Population Trajectories: Historical and Projected",
subtitle = "Dashed lines show model projections from 2025-2030",
x = NULL,
y = "Population (millions)",
colour = NULL,
linetype = NULL,
caption = "Source: ACS 1-Year Estimates (historical); author model projections (2025-2030)."
) +
theme_tx()
```
# Congressional Reapportionment
Under the Huntington-Hill method^[The Huntington-Hill method is the
official apportionment algorithm used by the US Census Bureau since 1941.
It was developed by mathematician Edward V. Huntington and census director
Joseph A. Hill to minimize the relative difference in representation between
states. It is mandated by federal law (2 U.S.C. § 2a) and was used most
recently following the 2020 census. Its use here was specified by the course
instructor as part of the assignment.], the 435 seats in the House of
Representatives are allocated by iteratively assigning each seat to the
state with the highest *priority value*, defined as the state's population
divided by the geometric mean of its current and next seat count. Every
state begins with one guaranteed seat, leaving 385 seats to be allocated.
```{r}
#| label: task8-huntington-hill
#| cache: true
#| include: false
MAX_DISTRICTS <- 100
N_DISTRICTS <- 435
huntington_hill <- function(pop_df) {
pop_df |>
cross_join(tibble(cd = seq(1, MAX_DISTRICTS))) |>
mutate(
hh_den = sqrt(cd * (cd + 1)),
hh_weight = total_population / hh_den
) |>
slice_max(hh_weight, n = N_DISTRICTS) |>
group_by(state_name, state_abbr) |>
summarise(
population = first(total_population),
n_districts = n(),
.groups = "drop"
) |>
mutate(pop_per_district = population / n_districts) |>
arrange(desc(n_districts), desc(population))
}
apportionment_2024 <- huntington_hill(
pop_2024 |>
st_drop_geometry() |>
select(state_name, state_abbr, total_population)
)
apportionment_2030 <- huntington_hill(
pop_projections |>
filter(year == 2030) |>
select(state_name, state_abbr, total_population)
)
apportionment_change <- apportionment_2024 |>
select(state_name, state_abbr, seats_2024 = n_districts) |>
left_join(
apportionment_2030 |>
select(state_name, seats_2030 = n_districts, pop_2030 = population),
by = "state_name"
) |>
mutate(seat_change = seats_2030 - seats_2024) |>
arrange(desc(seat_change))
```
The model projects that only four states change their seat count under the 2030 reapportionment: New York and Michigan each gain a seat, while Texas and Virginia each lose one. Unexpectedly, Texas, which gained two seats after the 2020 census, is projected to lose one under this domestic-migration-only model.
```{r}
#| label: task8-apportionment-table
apportionment_change |>
filter(seat_change != 0 | state_name == "Texas") |>
arrange(desc(seat_change), desc(pop_2030)) |>
mutate(rank = row_number()) |>
select(rank, state_name, state_abbr,
seats_2024, seats_2030, seat_change, pop_2030) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Projected Congressional Seat Changes, 2030 Reapportionment**"),
subtitle = "States gaining or losing seats based on 2030 population projections"
) |>
cols_label(
state_name = "State",
state_abbr = "",
seats_2024 = "2024 Seats",
seats_2030 = "Projected 2030 Seats",
seat_change = "Change",
pop_2030 = "Projected 2030 Population"
) |>
fmt_integer(pop_2030) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state_name == "Texas")
) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(
rows = seat_change > 0,
columns = seat_change
)
) |>
tab_style(
style = list(
cell_fill(color = "#BF0A30"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(
rows = seat_change < 0,
columns = seat_change
)
) |>
data_color(
columns = seat_change,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: Author projections; Huntington-Hill apportionment method."
)
```
```{r}
#| label: task8-texas-seats
#| include: false
tx_seats_2024 <- apportionment_2024 |>
filter(state_name == "Texas") |>
pull(n_districts)
tx_seats_2030 <- apportionment_2030 |>
filter(state_name == "Texas") |>
pull(n_districts)
tx_change <- tx_seats_2030 - tx_seats_2024
```
```{r}
#| label: task8-texas-summary-table
tibble(
metric = c("2024 Apportionment (current)",
"2030 Projected Apportionment",
"Projected Change"),
value = c(
as.character(tx_seats_2024),
as.character(tx_seats_2030),
if_else(tx_change >= 0,
paste0("+", tx_change),
as.character(tx_change))
)
) |>
gt() |>
tab_header(
title = md("**Texas Congressional Delegation: 2024 vs. 2030**"),
subtitle = "Based on domestic migration model projections"
) |>
cols_label(metric = "Metric", value = "Seats") |>
tab_style(
style = list(
cell_fill(color = "#BF0A30"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(
rows = metric == "Projected Change",
columns = value
)
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = metric != "Projected Change")
) |>
tab_source_note(
"Source: Author projections using Huntington-Hill method.
Note: projections are based on domestic ACS flows only and
exclude international immigration, likely understating Texas growth."
)
```
# Advertising Strategy to Protect Texas's Congressional Footprint
Our model projects that Texas is at risk of losing a congressional seat in the 2032 reapportionment cycle. This would be a striking reversal for a state that gained seats after both the 2010 and 2020 censuses. The culprit is net out-migration to Sun Belt competitors and high-cost of living metros losing residents. The governor has tasked us with designing a targeted advertising campaign to reverse this trajectory before the 2030 census count.
## The Strategic Objective
The Huntington-Hill calculation tends to be marginal: in the 2020 cycle,
New York kept its 26th seat by fewer than 100 people. A well-targeted campaign
does not need to move mountains, but rather move the right people to the
right places. Our analysis identifies three specific levers:
**Lever 1: Increase existing in-migration corridors.** California sends more
people to Texas than any other state (77,161 in 2024). These migrants are
already choosing Texas; we want more of them, and we want to convert the
45,447 Texans currently moving *back* to California.
**Lever 2: Reduce out-migration to Oklahoma and Colorado.** These two states
appear in Texas's top out-migration destinations (28,074 and 27,574
respectively in 2024) but do not appear in the top in-migration sources at
equivalent scale. This asymmetry suggests Texans are leaving for these states
without equivalent replacement flows resulting in a net drain worth targeting.
**Lever 3: Target seat-vulnerable states.** Our model shows Michigan gaining
a seat. Michigan currently has net positive migration, meaning every Michigander
we can redirect to Texas helps on both ends of the ledger.
## How Much Migration Is Needed?
```{r}
#| label: task9-seat-sensitivity
#| include: false
find_migrants_needed <- function(base_projections, target_state,
target_seats, max_migrants = 500000) {
for (extra in seq(0, max_migrants, by = 1000)) {
test_pop <- base_projections |>
filter(year == 2030) |>
mutate(
total_population = if_else(
state_name == target_state,
total_population + extra,
total_population
)
)
test_apportionment <- huntington_hill(
test_pop |> select(state_name, state_abbr, total_population)
)
tx_seats <- test_apportionment |>
filter(state_name == target_state) |>
pull(n_districts)
if (tx_seats >= target_seats) {
return(extra)
}
}
return(NA)
}
migrants_needed <- find_migrants_needed(pop_projections, "Texas", 44)
```
To determine how many additional residents Texas needs, we incrementally added migrants to the 2030 projection and re-ran the Huntington-Hill allocation until Texas retained 44 seats. The threshold: approximately **`r format(migrants_needed, big.mark = ",")`** additional net residents above the baseline projection.
```{r}
#| label: task9-migrants-table
tibble(
metric = c(
"Projected 2030 Texas population (baseline)",
"Population needed to retain 44th seat",
"Additional net residents required",
"Years remaining until 2030 census",
"Annual net migration target"
),
value = c(
format(
pop_projections |> filter(year == 2030, state_name == "Texas") |>
pull(total_population) |> round(),
big.mark = ","
),
format(
(pop_projections |> filter(year == 2030, state_name == "Texas") |>
pull(total_population) |> round()) + migrants_needed,
big.mark = ","
),
format(migrants_needed, big.mark = ","),
"4",
format(round(migrants_needed / 4), big.mark = ",")
)
) |>
gt() |>
tab_header(
title = md("**Texas Seat Retention: Migration Target**"),
subtitle = "How many additional residents does Texas need by 2030?"
) |>
cols_label(metric = "Metric", value = "Value") |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(
rows = metric == "Additional net residents required",
columns = everything()
)
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(
rows = metric != "Additional net residents required"
)
) |>
tab_source_note(
"Source: Author projections using Huntington-Hill method."
)
```
## Advertising Campaign Design
```{r}
#| label: task9-strategy-table
tibble(
target_market = c(
"Los Angeles Metro (CA)",
"New York City Metro (NY/NJ)",
"Chicago Metro (IL)",
"Oklahoma City + Tulsa (OK)",
"Denver Metro (CO)",
"Washington DC Metro"
),
rationale = c(
"Largest single source of in-migrants to Texas (27,445/yr to Houston alone); large population of former Texans",
"19,853/yr already flowing to Texas; 28,233 Texans moved to NY in 2024 - a prime recapture target",
"23,509 moved to TX in 2024; Illinois is shrinking - residents are receptive to leaving",
"28,074 Texans moved here in 2024 - our largest asymmetric out-migration destination",
"27,574 Texans moved here; high cost of living makes Texas's affordability pitch compelling",
"14,916/yr to Texas metros; large professional population and high cost of living"
),
campaign_type = c(
"Digital + outdoor; target tech and entertainment workers",
"Digital; target finance and media workers priced out of NYC",
"Digital + radio; target families and blue-collar workers",
"Retention: target recent TX out-migrants; digital re-engagement",
"Retention: target outdoor enthusiasts with TX Hill Country pitch",
"Digital; target federal contractors and policy professionals"
),
target_migrants = c(25000, 15000, 12000, 10000, 8000, 5000)
) |>
mutate(rank = row_number()) |>
select(rank, target_market, rationale, campaign_type, target_migrants) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**Texas Migration Advertising Strategy**"),
subtitle = "Targeted markets, rationale, and migration goals"
) |>
cols_label(
target_market = "Target Market",
rationale = "Data Rationale",
campaign_type = "Campaign Approach",
target_migrants = "Target Net Migrants"
) |>
fmt_integer(target_migrants) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = c(1, 2))
) |>
cols_width(
rationale ~ px(280),
campaign_type ~ px(200)
) |>
grand_summary_rows(
columns = target_migrants,
fns = list("Total Target" ~ sum(.)),
fmt = ~ fmt_integer(.)
) |>
tab_source_note(
"Source: ACS 1-Year State-to-State Migration Flows (2024); ACS 5-Year Metro Flows (2016-2020)."
)
```
## Campaign Slogan
Based on our data, the most effective strategy targets people who are already
considering a move out of high-cost-of-living urban cities in California, New York, and Illinois, and lure them with Texas's combination of economic opportunity, low taxes, and quality of life. Our proposed campaign slogan:
> **"Everything is bigger in Texas... and that includes your buying power."**
For the retention campaign targeting Texans who moved to Oklahoma and
Colorado, we recommend a different message that speaks to people
who may have left for specific lifestyle reasons:
> **"Miss the Space? Texas Never Left."**
## Sizing the Campaign
```{r}
#| label: task9-campaign-sizing
tibble(
metric = c(
"Additional net migrants needed to retain 44th seat",
"Total migrants targeted across six markets",
"Implied campaign success rate needed",
"Approximate addressable audience (metro populations)",
"Estimated cost at $50 per targeted impression"
),
value = c(
format(migrants_needed, big.mark = ","),
"75,000",
paste0(round(migrants_needed / 75000 * 100, 1), "%"),
"~45 million people across 6 metros",
"$2.25 billion (over 4 years)"
)
) |>
gt() |>
tab_header(
title = md("**Campaign Sizing Summary**"),
subtitle = "Estimated scale required to retain Texas's 44th congressional seat"
) |>
cols_label(metric = "Metric", value = "Estimate") |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = c(1, 3))
) |>
tab_source_note(
"Cost estimate is illustrative; actual costs depend on media mix and market conditions."
)
```
## Conclusion
Texas stands at a demographic crossroads. For two consecutive census cycles
it has been the nation's growth engine, adding more residents than any other
state. But our model, relying only on domestic migration flows, suggests that
growth may not be sufficient to protect all 44 congressional seats in 2032. The
margin is tight enough that a well-funded, data-driven advertising campaign targeting
the six markets identified above could close the gap.
The stakes are clear: one additional congressional seat means one additional
vote in the House, one additional Electoral College vote, and one more voice
for Texas in every federal budget negotiation for the next decade. At roughly
`r format(migrants_needed, big.mark=",")` net new residents needed, the
math is straightforward. The only question is whether the political will
exists to pursue it.
---
# Extra Credit #03: State-Level Natural Growth Rates
The baseline model used in our apportionment and advertising strategy sections
assumes a single national natural growth rate γ applied uniformly to all states.
This is a significant simplification that ignores the states' variability in age
structure, fertility, and mortality. Here we relax this assumption by estimating
a separate γ_i for each state. Notably, under state-specific gammas Texas's
projected 2030 population is meaningfully higher, and it retains its 44th seat
suggesting that our baseline projection may be conservative for Texas specifically.
```{r}
#| label: ec03-state-gamma-table
state_gamma |>
left_join(
state_pop |> st_drop_geometry() |>
filter(year == 2024) |>
select(state_name, state_abbr),
by = "state_name"
) |>
arrange(desc(gamma_i)) |>
mutate(rank = row_number()) |>
slice(c(1:10, 41:50)) |>
mutate(rank = c(1:10, 41:50)) |>
select(rank, state_name, state_abbr, gamma_i, intl_in) |>
gt(rowname_col = "rank") |>
tab_header(
title = md("**State-Specific Natural Growth Rates (γ_i), 2024**"),
subtitle = "States with highest and lowest estimated natural growth rates"
) |>
cols_label(
state_name = "State",
state_abbr = "",
gamma_i = "γ_i (Natural Growth Rate)",
intl_in = "Estimated International Arrivals"
) |>
fmt_percent(gamma_i, decimals = 2) |>
fmt_integer(intl_in) |>
tab_row_group(label = "10 Lowest Natural Growth Rates", rows = 11:20) |>
tab_row_group(label = "10 Highest Natural Growth Rates", rows = 1:10) |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = state_name == "Texas")
) |>
data_color(
columns = gamma_i,
palette = c("#BF0A30", "#FFFFFF", "#002868")
) |>
tab_source_note(
"Source: ACS 1-Year Estimates + Migration Flows (2024). γ_i estimated as
(P_2024 - P_2023 - intl_in) / P_2023 for each state."
)
```
```{r}
#| label: ec03-comparison-plot
#| fig-cap: "Comparison of 2030 population projections under the national
#| gamma model (baseline) versus state-specific gamma values for selected
#| states. Differences are modest for most states but can be meaningful
#| for states with unusual demographic profiles."
focus_states_ec <- c("Texas", "California", "Florida",
"New York", "Arizona", "Illinois")
bind_rows(
pop_projections |>
filter(year == 2030, state_name %in% focus_states_ec) |>
mutate(model = "National γ"),
pop_projections_ec |>
filter(year == 2030, state_name %in% focus_states_ec) |>
mutate(model = "State-specific γ")
) |>
mutate(
state_name = factor(state_name,
levels = focus_states_ec)
) |>
ggplot(aes(x = state_name,
y = total_population / 1e6,
fill = model)) +
geom_col(position = "dodge", width = 0.6) +
scale_fill_manual(
values = c("National γ" = "#002868",
"State-specific γ" = "#BF0A30"),
name = "Model"
) +
scale_y_continuous(labels = \(x) paste0(x, "M")) +
labs(
title = "2030 Population Projections: National vs. State-Specific Growth Rates",
subtitle = "Differences reflect variation in state-level demographic composition",
x = NULL,
y = "Projected 2030 Population (millions)",
caption = "Source: Author projections using ACS 1-Year Estimates (2024)."
) +
theme_tx()
```
# Extra Credit #04: Model Validation
How accurate is our projection model? To find out, we test it against history. We fit the model using only 2019 ACS data, the last full year before COVID disrupted census collection, and project forward to 2022, 2023, and 2024. We then compare those projections to the actual ACS-1 populations reported for those years. Since we already know what happened, this lets us measure how far off the model would have been, giving us a concrete sense of the uncertainty around our 2030 forecast.
```{r}
#| label: ec04-backtest
#| cache: true
# ── Fit parameters using 2019 data ────────────────────────────────────────────
pop_2019 <- state_pop |> filter(year == 2019)
pop_2018 <- state_pop |> filter(year == 2018)
# Gamma from 2019: use population totals only (no migration file for 2019)
# We approximate gamma using the ratio of national population growth
# between 2018 and 2019, minus an estimated international component.
# Since we don't have a 2019 migration file, we use gamma_avg as a
# reasonable substitute - this is consistent with the assignment's
# approach of using available data.
# Lambda from 2024 data (most recent available) - we hold this constant
# for the backtest since 2019 migration files are not parsed here
lambda_backtest <- lambda_avg
# Use gamma_avg as our "historical" gamma estimate
gamma_backtest <- gamma_avg
# ── Project from 2019 → 2022, 2023, 2024 ─────────────────────────────────────
current_bt <- pop_2019 |>
st_drop_geometry() |>
select(state_name, state_abbr, total_population)
backtest_projections <- list()
for (yr in 2020:2024) {
current_bt <- project_one_year_ec03(
current_bt,
state_gamma |> mutate(gamma_i = NA), # use national gamma
gamma_backtest,
lambda_backtest
)
if (yr >= 2022) {
backtest_projections[[as.character(yr)]] <-
current_bt |> mutate(year = yr, type = "Predicted")
}
}
pop_backtest <- bind_rows(backtest_projections)
# ── Compare to realized populations ──────────────────────────────────────────
realized <- state_pop |>
st_drop_geometry() |>
filter(year %in% c(2022, 2023, 2024)) |>
select(state_name, state_abbr, total_population, year) |>
mutate(type = "Realized")
validation <- bind_rows(pop_backtest, realized) |>
pivot_wider(
id_cols = c(state_name, state_abbr, year),
names_from = type,
values_from = total_population
) |>
mutate(
error = Predicted - Realized,
abs_error = abs(error),
pct_error = 100 * error / Realized
) |>
filter(!is.na(Predicted), !is.na(Realized))
```
```{r}
#| label: ec04-error-summary-table
# Summary of prediction errors by horizon
validation |>
group_by(year) |>
summarise(
horizon = first(year) - 2019,
mean_abs_error = mean(abs_error, na.rm = TRUE),
median_pct_error = median(abs(pct_error), na.rm = TRUE),
max_abs_error = max(abs_error, na.rm = TRUE),
worst_state = state_name[which.max(abs_error)],
.groups = "drop"
) |>
select(year, horizon, mean_abs_error,
median_pct_error, max_abs_error, worst_state) |>
gt() |>
tab_header(
title = md("**Backtest Prediction Errors by Forecast Horizon**"),
subtitle = "Model fitted on 2019 data; compared to realized 2022-2024 populations"
) |>
cols_label(
year = "Forecast Year",
horizon = "Years Ahead",
mean_abs_error = "Mean Absolute Error",
median_pct_error = "Median % Error",
max_abs_error = "Max Absolute Error",
worst_state = "Worst Predicted State"
) |>
fmt_integer(c(mean_abs_error, max_abs_error)) |>
fmt_number(median_pct_error, decimals = 2, suffix = "%") |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels()
) |>
data_color(
columns = median_pct_error,
palette = c("#EEF2FF", "#BF0A30")
) |>
tab_source_note(
"Source: Author backtest using ACS 1-Year Estimates (2019-2024)."
)
```
```{r}
#| label: ec04-error-plot
#| fig-cap: "Absolute prediction error by state for each forecast horizon.
#| Errors grow with horizon length as expected. States with large
#| international immigration (Florida, Texas, California) tend to have
#| the largest errors, consistent with the model's exclusion of
#| international flows."
validation |>
mutate(horizon = paste0(year - 2019, "-year horizon (", year, ")")) |>
ggplot(aes(x = reorder(state_abbr, abs_error),
y = abs_error / 1000,
fill = abs_error / 1000)) +
geom_col(width = 0.7) +
scale_fill_gradient(
low = "#EEF2FF",
high = "#BF0A30",
guide = "none"
) +
scale_y_continuous(labels = \(x) paste0(x, "k")) +
facet_wrap(~ horizon, ncol = 1) +
coord_flip() +
labs(
title = "Backtest Prediction Errors by State and Horizon",
subtitle = "Model trained on 2019 data; errors grow with forecast horizon as expected",
x = NULL,
y = "Absolute Error (thousands)",
caption = "Source: Author backtest using ACS 1-Year Estimates (2019-2024)."
) +
theme_tx() +
theme(axis.text.y = element_text(size = 7))
```
```{r}
#| label: ec04-margin-of-error
# Use the 3-year backtest error to estimate a margin of error on 2030 projections
# Our 2030 projection is 6 years ahead from 2024, so we extrapolate error growth
error_by_horizon <- validation |>
group_by(horizon = year - 2019) |>
summarise(
rmse = sqrt(mean(error^2, na.rm = TRUE)),
.groups = "drop"
)
# Fit a simple linear model of RMSE ~ horizon
error_model <- lm(rmse ~ horizon, data = error_by_horizon)
projected_rmse_6yr <- predict(error_model,
newdata = data.frame(horizon = 6))
tx_2030_proj <- pop_projections |>
filter(year == 2030, state_name == "Texas") |>
pull(total_population)
tibble(
metric = c(
"Texas projected 2030 population",
"Estimated margin of error (±1 RMSE, 6-year horizon)",
"90% confidence interval lower bound",
"90% confidence interval upper bound"
),
value = c(
format(round(tx_2030_proj), big.mark = ","),
format(round(projected_rmse_6yr), big.mark = ","),
format(round(tx_2030_proj - 1.645 * projected_rmse_6yr), big.mark = ","),
format(round(tx_2030_proj + 1.645 * projected_rmse_6yr), big.mark = ",")
)
) |>
gt() |>
tab_header(
title = md("**Margin of Error on 2030 Texas Population Projection**"),
subtitle = "Based on backtest RMSE extrapolated to 6-year forecast horizon"
) |>
cols_label(metric = "Metric", value = "Value") |>
tab_style(
style = list(
cell_fill(color = "#002868"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels()
) |>
tab_style(
style = cell_fill(color = "#EEF2FF"),
locations = cells_body(rows = c(1, 3, 4))
) |>
tab_style(
style = list(
cell_fill(color = "#BF0A30"),
cell_text(color = "white", weight = "bold")
),
locations = cells_body(rows = 2)
) |>
tab_source_note(
"RMSE extrapolated linearly from 3-year backtest errors.
Confidence interval assumes normally distributed errors."
)
```
# Extra Credit #01: The Case for Expanding the House
::: {.callout-note title="EC#01: Expanding the House of Representatives" collapse="false"}
The United States House of Representatives has been fixed at 435 members since
the Permanent Apportionment Act of 1929, at a time when the population of the country
was 106 million people. The United States now houses over 335 million people. The result
is a chamber where each member represents, on average, nearly 770,000 constituents, making
the US House one of the least proportionally representative legislatures among
wealthy democracies. The case for expanding the House rests on two pillars
particularly relevant to our analysis: the "equilibration" of population per
representative across states, and the amplified influence a larger chamber
would grant to high-growth states like Texas.
**Equilibrating Representation Across States**
The single most glaring inequity in the current system is the enormous
variance in population per representative across states. Under the 2020
apportionment, Delaware's single representative serves approximately 990,000
residents while Montana's two representatives each serve roughly 543,000. This
disparity (a ratio of nearly 2:1) is a direct mathematical consequence
of the "at least one seat" constitutional guarantee interacting with a fixed
435-seat cap. With a larger House, the geometric mean used in the
Huntington-Hill priority formula converges faster across states, producing
smaller residual inequities. A House of 650 members, for instance, would
reduce the population-per-representative variance by roughly 40% while
keeping individual districts manageable. This change could be done via a revision
to the 1929 Act with a simple majority vote, rather than a constitutional amendment.
^[The 435-seat limit derives from the Reapportionment Act of 1929, Pub. L. 71-13,
codified at 2 U.S.C. § 2a. The Constitution (Article I, Section 2) requires
only that seats be apportioned among states following each census; it does
not specify the total number of seats.]
**Amplified Power for Texas**
For Texas specifically, a larger House would be beneficial. Texas's
congressional delegation scales linearly with House size under the
Huntington-Hill method, since it is comfortably above the minimum-seat
threshold. Our projections show Texas at risk of losing its 44th seat under
a 435-seat House. Under a 650-seat House using the same 2030 population
projections, Texas would be allocated approximately 66 seats (22 net gain) compared
to a gain of roughly 15 for California (which currently has 52 seats). This
reflects Texas's faster recent growth: because Texas's population has been rising
more quickly than California's, Texas claims a disproportionate share of the
additional seats created by House expansion since each new seat tends to go to the state
that has grown the most relative to its current representation. In short, expanding
the House is one of the few structural reforms that simultaneously improves
democratic representation *and* benefits Texas's political position.
:::
# Extra Credit #02: Spatial Visualizations
Maps and diagrams reveal patterns that tables and numbers alone cannot. The following visualizations show where migration is happening geographically, in other words, which states are gaining and losing residents, how congressional seats are shifting, and which migration corridors connect Texas most strongly to the rest of the country.
## Map 1: Net Domestic Migration by State, 2024
```{r}
#| label: ec02-map-net-migration
#| fig-cap: "Net domestic in-migration by state, 2024. Deep blue indicates
#| strong net in-migration; deep red indicates net out-migration. Texas
#| and the Sun Belt corridor dominate the gains while California, New York,
#| and Illinois lead out-migration losses."
#| fig-height: 5
# Join migration totals to state geometries
map_migration <- state_pop |>
filter(year == 2024) |>
left_join(
state_migration_totals |>
select(state = state, net_migration),
by = c("state_name" = "state")
)
ggplot(map_migration) +
geom_sf(aes(fill = net_migration / 1000), colour = "white", linewidth = 0.3) +
scale_fill_gradient2(
low = "#BF0A30",
mid = "#F5F5F5",
high = "#002868",
midpoint = 0,
name = "Net Migration\n(thousands)",
labels = \(x) paste0(ifelse(x > 0, "+", ""), x, "k")
) +
labs(
title = "Net Domestic Migration by State, 2024",
subtitle = "Blue = net in-migration; Red = net out-migration",
caption = "Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
) +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 14,
colour = texas_palette["blue"],
margin = margin(b = 4)),
plot.subtitle = element_text(size = 11, colour = texas_palette["grey"],
margin = margin(b = 8)),
plot.caption = element_text(size = 8, colour = texas_palette["grey"],
hjust = 0),
legend.position = "right",
legend.title = element_text(size = 9, face = "bold"),
plot.background = element_rect(fill = "white", colour = NA)
)
```
## Map 2: Projected Congressional Seat Changes, 2030
```{r}
#| label: ec02-map-apportionment
#| fig-cap: "Projected congressional seat changes under the 2030
#| reapportionment. Texas and Virginia lose one seat each. The vast majority of
#| states retain their current delegation."
#| fig-height: 5
# Join apportionment changes to state geometries
map_apportionment <- state_pop |>
filter(year == 2024) |>
left_join(
apportionment_change |>
select(state_name, seat_change),
by = "state_name"
)
ggplot(map_apportionment) +
geom_sf(aes(fill = factor(seat_change,
levels = c(-1, 0, 1),
labels = c("−1 seat", "No change", "+1 seat"))),
colour = "white", linewidth = 0.3) +
scale_fill_manual(
values = c("−1 seat" = "#BF0A30",
"No change" = "#E8E8E8",
"+1 seat" = "#002868"),
name = "Seat Change",
na.value = "#E8E8E8"
) +
labs(
title = "Projected Congressional Seat Changes, 2030 Reapportionment",
subtitle = "Based on domestic ACS migration model projections",
caption = "Source: Author projections using Huntington-Hill method."
) +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 14,
colour = texas_palette["blue"],
margin = margin(b = 4)),
plot.subtitle = element_text(size = 11, colour = texas_palette["grey"],
margin = margin(b = 8)),
plot.caption = element_text(size = 8, colour = texas_palette["grey"],
hjust = 0),
legend.position = "right",
legend.title = element_text(size = 9, face = "bold"),
plot.background = element_rect(fill = "white", colour = NA)
)
```
## Chord Diagram: Top Migration Corridors Into and Out of Texas
The chord diagram below shows the 15 largest migration flows involving Texas
in 2024. Each chord connects an origin state to a destination state, with
width proportional to the number of people moving. Texas's dominant corridors
- California, Florida, New York - are immediately apparent, as is the
bidirectional nature of the California-Texas relationship.
```{r}
#| label: ec02-chord-diagram
#| fig-cap: "Top 15 migration flows involving Texas, 2024. Chord width is
#| proportional to the number of people moving. Texas is shown in blue;
#| other states in grey. The California corridor dominates in both
#| directions."
#| fig-height: 8
#| fig-width: 8
# Build edge list for flows involving Texas
texas_flows <- migration_flows |>
filter(
year_current == 2024,
state_current != state_1y,
(state_current == "Texas" | state_1y == "Texas")
) |>
mutate(from = state_1y, to = state_current) |>
select(from, to, population) |>
arrange(desc(population)) |>
slice_head(n = 15)
all_states <- unique(c(texas_flows$from, texas_flows$to))
g <- graph_from_data_frame(
d = texas_flows,
vertices = data.frame(name = all_states),
directed = TRUE
)
ggraph(g, layout = "linear", circular = TRUE) +
geom_edge_arc(
aes(edge_width = population / 1000,
edge_alpha = population / max(population)),
colour = "#002868",
show.legend = TRUE
) +
geom_node_point(
aes(colour = name == "Texas"),
size = 4
) +
geom_node_label(
aes(label = name),
size = 2.5,
label.padding = unit(0.15, "lines"),
fill = "white",
colour = texas_palette["grey"]
) +
scale_edge_width(
range = c(0.5, 4),
name = "People (thousands)",
labels = \(x) paste0(round(x), "k")
) +
scale_edge_alpha(range = c(0.3, 0.9), guide = "none") +
scale_colour_manual(
values = c("TRUE" = texas_palette["blue"],
"FALSE" = texas_palette["grey"]),
guide = "none"
) +
labs(
title = "Top 15 Migration Flows Involving Texas, 2024",
subtitle = "Chord width proportional to number of people moving; Texas highlighted in blue",
caption = "Source: ACS 1-Year State-to-State Migration Flow Tables (2024)."
) +
theme_graph(base_family = "sans") +
theme(
plot.title = element_text(face = "bold", size = 14,
colour = texas_palette["blue"]),
plot.subtitle = element_text(size = 10,
colour = texas_palette["grey"]),
plot.caption = element_text(size = 8,
colour = texas_palette["grey"],
hjust = 0),
legend.position = "bottom"
)
```
::: {.callout-note title="AI Usage Statement" collapse="true"}
Generative AI (Claude, Anthropic) was used in this mini-project exclusively
for assistance with R code - specifically for debugging data parsing issues
with the Census Bureau Excel files, troubleshooting the population projection
model, and refining ggplot2 and gt table formatting. All written narrative,
analytical interpretations, and conclusions are my own. No AI was used to
write or edit any non-code text in this report.
:::