Home Field Advantage: Forecasting Team USA’s Gold Medals at LA 2028
Author
Raúl J. Solá Navarro
Published
May 14, 2026
Introduction
The 2028 Los Angeles Summer Olympics are coming, and with them comes a massive commercial opportunity. Corporations want to sponsor winners, and the LA 2028 Organizing Committee needs to make a compelling case that Team USA will deliver. This report builds a data-driven model to forecast how many gold medals Team USA is likely to win in 2028, using historical Olympic data scraped from Olympedia.
The model accounts for three primary effects:
US Baseline Performance Team USA’s historic gold medal rate at Summer Olympics, the highest of any nation.
Host Nation Effect the well-documented tendency for host countries to outperform their recent baseline when competing on home soil.
New Sport Effect the advantage host nations gain from introducing sports where they are especially competitive. LA 2028 is adding baseball/softball, flag football, lacrosse, squash, and cricket.
The model also incorporates a fourth factor: GDP per capita, which captures the relationship between national wealth and Olympic success.
Task 1: Data Acquisition
1.1 Caching Downloader
Olympedia is a volunteer-run resource, so we cache every page locally after the first download. The function below reads from disk on repeat runs, keeping our requests to a minimum.
We scrape the editions page to get every Summer and Winter Olympiad since 1896, extracting the year, season, olympiad number, edition ID, and host country code from each row’s flag image URL.
The scraper returned 73 editions covering both Summer and Winter Games. The table below summarizes coverage by season.
Show code
editions |>group_by(season) |>summarise(Games =n(),`First Year`=min(year),`Last Year`=max(year),`Unique Hosts`=n_distinct(host_code),.groups ="drop" ) |>gt() |>tab_header(title ="Olympic Editions Scraped from Olympedia",subtitle ="Summer and Winter Games from 1896 to present" ) |>cols_label(season ="Season") |>tab_style(style =cell_fill(color ="#f7f5ee"),locations =cells_body(rows = season =="Summer") ) |>fmt_integer(columns =where(is.numeric))
Olympic Editions Scraped from Olympedia
Summer and Winter Games from 1896 to present
Season
Games
First Year
Last Year
Unique Hosts
Summer
39
1,896
2,032
23
Winter
34
1,924
2,034
13
1.3 Sports per Edition
For each edition page, we extract the list of medal disciplines and their Olympedia URLs. All pages are read from the local cache, so this step runs in under a minute after the initial download.
The scraper found 1,456 sport-edition combinations. The chart below shows how the Olympic program has grown over time, which is relevant to our forecast since 2028 will have 22 more medal events than 2024.
Show code
sports_by_edition |>left_join(editions |>select(edition_id, year, season), by ="edition_id") |>filter(season =="Summer") |>group_by(year) |>summarise(n_sports =n(), .groups ="drop") |>ggplot(aes(x = year, y = n_sports)) +geom_col(fill ="#1a1a2e", alpha =0.85) +geom_smooth(method ="loess", se =FALSE, colour ="#c9a84c",linewidth =1.2) +labs(title ="Number of Medal Disciplines at Each Summer Olympics",x ="Year", y ="Number of Disciplines",caption ="Source: Olympedia" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"))
The Summer Olympic program has expanded from 10 disciplines in 1896 to nearly 50 today.
1.4 Medal Tables
For each sport-edition page, we extract the country-level medal summary table. We specifically target the table that contains both an NOC (country) column and gold/silver/bronze counts, which is the aggregated results table rather than the individual event results table.
Medal Data Coverage: Most Recent 10 Summer Olympics
Full dataset: 10,522 records across 62 editions
Year
Countries
Disciplines
Table Rows
2,000
80
39
455
2,004
74
39
451
2,008
87
41
425
2,010
99
29
338
2,012
86
39
419
2,014
88
31
366
2,016
86
41
436
2,018
94
39
420
2,020
93
49
505
2,024
92
47
495
Task 2: Country Code Standardization
Olympedia uses IOC country codes that sometimes differ from ISO 3166-1 alpha-3. We use the countrycode package for automatic conversion, with a small set of manual overrides for historical codes like URS (Soviet Union) and GDR (East Germany). Per course instructions, URS is mapped to RUS to ensure continuity.
dropped_codes |>gt() |>tab_header(title ="Country Codes Dropped During Standardization",subtitle ="These IOC codes could not be mapped to a current ISO-3 country" ) |>cols_label(country_code ="IOC Code", n ="Rows Affected") |>fmt_integer(columns = n) |>tab_footnote(footnote =paste("MIX = Mixed teams; IOA/AIN/EOR = Individual Olympic Athletes;","WIF = West Indies Federation; UAR = United Arab Republic;","AHO = Netherlands Antilles; KOS = Kosovo (not in ISO standard)." ),locations =cells_title(groups ="subtitle") )
Country Codes Dropped During Standardization
These IOC codes could not be mapped to a current ISO-3 country1
IOC Code
Rows Affected
MIX
54
AIN
5
KOS
4
AHO
2
IOA
2
UAR
2
EOR
1
WIF
1
1 MIX = Mixed teams; IOA/AIN/EOR = Individual Olympic Athletes; WIF = West Indies Federation; UAR = United Arab Republic; AHO = Netherlands Antilles; KOS = Kosovo (not in ISO standard).
After standardization, 10,451 records remain (down from 10,522). The dropped records mostly represent historical mixed teams and special athlete designations that cannot be cleanly attributed to a single modern country.
Show code
total_golds <- medals_clean |>filter(season =="Summer") |>group_by(iso_code) |>summarise(total_gold =sum(gold, na.rm =TRUE), .groups ="drop")world_gold <- world |>left_join(total_golds, by =c("iso_a3"="iso_code")) |>mutate(total_gold =replace_na(total_gold, 0))ggplot(world_gold) +geom_sf(aes(fill = total_gold), colour ="white", linewidth =0.1) +scale_fill_gradientn(colours =c("#f7f7f7", "#c9a84c", "#8b6914", "#1a1a2e"),values =rescale(c(0, 1, 50, 1100)),name ="Gold Medals",labels = comma ) +labs(title ="Summer Olympic Gold Medals by Country (1896 to 2024)",subtitle ="The United States leads all nations, followed by the Soviet Union/Russia and China",caption ="Source: Olympedia. URS coded as RUS per course instructions." ) +theme_void() +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =10, colour ="grey40"),legend.position ="bottom",legend.key.width =unit(2, "cm") )
US dominance in total gold medals is clear, but China and European nations have closed the gap significantly since 1990.
Task 3: Exploratory Data Analysis
3.1 US Baseline Performance
Show code
# The 1904 St. Louis Games are a known data quality issue. US club teams# competed as separate delegations, inflating the count to 164 golds.# We keep the record but exclude 1904 from baseline calculations.us_summer <- medals_clean |>filter(country_code =="USA", season =="Summer") |>group_by(year) |>summarise(gold =sum(gold, na.rm =TRUE),silver =sum(silver, na.rm =TRUE),bronze =sum(bronze, na.rm =TRUE),total =sum(total, na.rm =TRUE),.groups ="drop" )best_year <- us_summer |>filter(year !=1904) |>slice_max(gold, n =1)avg_gold_us <- us_summer |>filter(year >=1960) |>pull(gold) |>mean()
Finding 1: Setting aside the anomalous 1904 St. Louis Games, Team USA’s best modern performance was in 1984 with 102 gold medals. Since 1960 the period most relevant to predicting 2028 the US has averaged 39.8 gold medals per Games, more than any other nation by a wide margin.
Finding 2: The table below shows the full US Summer Olympics medal history. The 1980 dip reflects the US boycott of Moscow; the 1984 spike reflects the Soviet-led boycott of Los Angeles. Both are flagged to avoid misleading the inference in Task 4.
Show code
us_summer |>arrange(desc(year)) |>mutate(note =case_when( year ==1980~"US boycott of Moscow", year ==1984~"Soviet bloc boycott of LA", year ==1904~"Inflated: US club teams",TRUE~"" ) ) |>gt() |>tab_header(title ="Team USA Summer Olympic Medal Totals",subtitle ="All modern Summer Olympics, 1896 to 2024" ) |>cols_label(year ="Year", gold ="Gold", silver ="Silver",bronze ="Bronze", total ="Total", note ="Note" ) |>data_color(columns = gold, palette =c("white", "#c9a84c")) |>fmt_integer(columns =c(year, gold, silver, bronze, total)) |>tab_style(style =cell_text(color ="grey50", style ="italic"),locations =cells_body(columns = note) ) |>tab_style(style =cell_fill(color ="#fff8e1"),locations =cells_body(rows = year %in%c(1980, 1984, 1904)) )
Team USA Summer Olympic Medal Totals
All modern Summer Olympics, 1896 to 2024
Year
Gold
Silver
Bronze
Total
Note
2,024
40
44
42
126
2,020
39
41
33
113
2,018
6
5
7
18
2,016
46
37
38
121
2,014
10
5
7
22
2,012
48
26
31
105
2,010
4
9
8
21
2,008
36
39
37
112
2,004
45
50
31
126
2,000
44
28
37
109
1,996
57
37
30
124
1,992
49
42
47
138
1,988
49
39
34
122
1,984
102
80
39
221
Soviet bloc boycott of LA
1,976
34
35
25
94
1,972
33
31
30
94
1,968
45
28
34
107
1,964
36
26
28
90
1,960
34
21
16
71
1,956
32
25
17
74
1,952
45
19
17
81
1,948
50
32
30
112
1,936
24
21
13
58
1,932
60
49
36
145
1,928
31
26
24
81
1,924
45
27
27
99
1,920
41
27
27
95
1,912
26
19
19
64
1,908
23
12
12
47
1,904
164
173
170
507
Inflated: US club teams
1,900
57
39
39
135
1,896
11
7
2
20
Finding 3: The visualization below shows the raw time series with context. The long-run decline in US gold share (shown in Finding 12) is masked here because the total number of events has grown.
Show code
us_summer |>mutate(flag =case_when( year ==1980~"Boycott year", year ==1984~"Boycott year", year ==1904~"1904 anomaly", gold > avg_gold_us ~"Above average",TRUE~"Below average" ) ) |>ggplot(aes(x = year, y = gold)) +geom_line(colour ="#1a1a2e", linewidth =1) +geom_point(aes(colour = flag, size = flag =="1904 anomaly")) +geom_hline(yintercept = avg_gold_us, linetype ="dashed",colour ="grey50", linewidth =0.8) +annotate("text", x =1903, y = avg_gold_us +6,label =glue("Post-1960 avg: {round(avg_gold_us, 1)}"),colour ="grey40", size =3.5) +scale_colour_manual(values =c("Above average"="#c9a84c", "Below average"="#1a1a2e","Boycott year"="orange", "1904 anomaly"="grey60" ),name =NULL ) +scale_size_manual(values =c("TRUE"=5, "FALSE"=3), guide ="none") +scale_x_continuous(breaks =seq(1896, 2024, by =8)) +labs(title ="Team USA Gold Medals at Summer Olympics (1896 to 2024)",subtitle ="Dashed line = post-1960 average",x ="Year", y ="Gold Medals",caption ="Source: Olympedia" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"),axis.text.x =element_text(angle =45, hjust =1),legend.position ="top" )
US gold medals per Summer Olympics. Boycott years and the 1904 anomaly are highlighted.
Finding 4: Since 1960, host nations have won an average of 184.4% more gold medals when hosting compared to their previous Summer Olympics performance. The effect is real but noisy; a few dramatic cases like Spain 1992 (+933%) and Great Britain 2012 (+743%) pull the average up considerably. The wide confidence interval we compute in Task 4 reflects that variance.
Finding 5: The table below details each host nation comparison. Note that cases where the host had zero prior golds are excluded (division by near-zero creates unreliable estimates).
Show code
host_effect |>select(year, host_code, prior_gold, host_gold_hosting, pct_increase_gold) |>arrange(year) |>gt() |>tab_header(title ="Host Nation Gold Medal Performance",subtitle ="Comparison to the prior Summer Olympics for the same country" ) |>cols_label(year ="Year",host_code ="Host",prior_gold ="Prior Games",host_gold_hosting ="Host Year",pct_increase_gold ="% Change" ) |>fmt_number(columns = pct_increase_gold, decimals =1) |>cols_label(pct_increase_gold ="% Change") |>fmt_integer(columns =c(prior_gold, host_gold_hosting)) |>data_color(columns = pct_increase_gold,palette =c("#c9a84c", "#f7f7f7", "#1a1a2e") )
Host Nation Gold Medal Performance
Comparison to the prior Summer Olympics for the same country
Year
Host
Prior Games
Host Year
% Change
1900
FRA
5
55
909.1
1904
USA
57
164
186.1
1908
GBR
3
56
1,514.3
1912
SWE
8
23
176.5
1924
FRA
9
14
52.6
1928
NED
4
8
88.9
1932
USA
31
60
92.1
1936
GER
5
40
636.4
1952
FIN
17
7
−57.1
1956
AUS
6
13
107.7
1960
ITA
8
13
58.8
1964
JPN
4
16
266.7
1968
MEX
0
3
600.0
1972
FRG
5
13
145.5
1976
CAN
0
0
0.0
1980
URS
49
80
62.6
1988
KOR
6
14
123.1
1992
ESP
1
15
933.3
1996
USA
49
57
16.2
2000
AUS
9
17
84.2
2004
GRE
5
8
54.5
2008
CHN
34
48
40.6
2010
SGP
0
0
0.0
2012
GBR
3
29
742.9
2014
CHN
39
38
−2.5
2016
BRA
6
7
15.4
2018
ARG
3
11
228.6
2020
JPN
15
27
77.4
2024
FRA
10
16
57.1
Finding 6: Most hosts improve when competing at home. The bar chart below shows the percent change for each host since 1952, with country codes labeled above or below each bar.
Host nation gold medal change vs. their prior Summer Olympics. Most hosts improve.
3.3 New Sport Effect
A “new sport” is defined as one that did not appear in the immediately preceding Summer Olympics. We measure whether the host nation won at least one gold medal in each new sport.
Finding 7: Across all Summer Olympics, host nations won at least one gold medal in 40% of the new sport disciplines they introduced. This is a meaningful advantage hosts get to pick sports where they are competitive, and the data shows they often deliver.
Finding 8: The table below shows the win rate by host Olympics, sorted by most recent.
Finding 9: The chart below compares host nation gold win rates in new versus established sports. The gap is notable and consistent.
Show code
established_sport_medals <- medals_clean |>filter(season =="Summer") |>left_join(new_sports_df, by =c("year", "sport_name")) |>mutate(is_new =replace_na(is_new, FALSE)) |>group_by(year, sport_name, is_new, host_code) |>summarise(host_won_gold =any(country_code == host_code & gold >0, na.rm =TRUE),.groups ="drop" ) |>filter(!is.na(host_code))established_sport_medals |>group_by(is_new) |>summarise(rate =mean(host_won_gold, na.rm =TRUE),n =n(), .groups ="drop") |>mutate(label =if_else(is_new, "New Sports", "Established Sports")) |>ggplot(aes(x = label, y = rate, fill = label)) +geom_col(width =0.5) +geom_text(aes(label =percent(rate, accuracy =1)),vjust =-0.5, size =5.5, fontface ="bold") +geom_text(aes(label =glue("n = {comma(n)} disciplines")),vjust =-2.3, size =3.5, colour ="grey40") +scale_y_continuous(labels =percent_format(), limits =c(0, 0.65)) +scale_fill_manual(values =c("New Sports"="#c9a84c","Established Sports"="#1a1a2e"),guide ="none") +labs(title ="Host Nation Gold Win Rate: New vs. Established Sports",x =NULL,y ="Share of Disciplines with at Least One Host Gold",caption ="Source: Olympedia" ) +theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold"))
Hosts win gold at a higher rate in sports they introduce than in established disciplines.
3.4 Additional Findings
Finding 10 (Inline): The Olympic program has expanded dramatically. The 2024 Paris Olympics featured 47 medal disciplines, compared to just 10 in 1896. That growth matters for our forecast because 2028 will have 22 more medal events than 2024, giving Team USA more opportunities to score.
Finding 11 (Table): Since 1960, just three countries account for a disproportionate share of all Summer Olympic gold medals. Team USA leads by a comfortable margin.
Top 10 Countries: Summer Olympic Golds (1960 to 2024)
#
Country
Gold Medals
1
USA
757
2
CHN
394
3
URS
346
4
RUS
233
5
JPN
203
6
GBR
191
7
GER
185
8
ITA
178
9
AUS
169
10
GDR
159
Finding 12 (Visualization): Despite winning more raw medals as the program has expanded, the US share of all gold medals has declined steadily since its early dominance. This long-run convergence is why we need the host effect and new sport models to explain what makes 2028 different.
Show code
total_gold_by_year <- medals_clean |>filter(season =="Summer") |>group_by(year) |>summarise(total_gold_all =sum(gold, na.rm =TRUE), .groups ="drop")us_summer |>left_join(total_gold_by_year, by ="year") |>mutate(us_share = gold / total_gold_all) |>ggplot(aes(x = year, y = us_share)) +geom_line(colour ="#1a1a2e", linewidth =1) +geom_point(colour ="#1a1a2e", size =2) +geom_smooth(method ="loess", se =TRUE, colour ="#c9a84c",fill ="#c9a84c", alpha =0.15) +scale_y_continuous(labels =percent_format(accuracy =1)) +labs(title ="USA Share of Summer Olympic Gold Medals Over Time",subtitle ="Red LOESS trend shows the long-run decline in US dominance",x ="Year", y ="US Share of All Gold Medals",caption ="Source: Olympedia" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"))
US gold medal share has trended downward as more nations developed competitive Olympic programs.
Task 4: Statistical Inference
We use the infer package to construct 95% confidence intervals for all three factors, restricting the sample to Summer Olympics from 1960 onward to keep the analysis relevant to 2028 conditions.
Show code
CUTOFF_YEAR <-1960# Exclude 1980 (US boycott of Moscow) and 1984 (Soviet boycott of LA)# Both years are outliers that distort the baseline and host effect estimatesBOYCOTT_YEARS <-c(1980, 1984)us_summer_recent <- us_summer |>filter(year >= CUTOFF_YEAR, !year %in% BOYCOTT_YEARS)host_effect_recent <- host_effect |>filter(year >= CUTOFF_YEAR, !year %in% BOYCOTT_YEARS)new_sport_recent <- new_sport_medals
The confidence interval for gold medals is wide (roughly 54% to 315%) because the host effect varies a lot from one country to another. Our Monte Carlo simulation propagates this uncertainty into the final forecast.
The estimated probability that a host nation wins gold in a new sport is 40% (95% CI: 32% to 48%). This is based on 159 new sport-edition combinations across all Summer Olympics.
Extra Credit 1: Bootstrap Confidence Intervals
The t-test and proportion test above rely on normality assumptions. We verify these using infer’s bootstrap pipeline, which makes no distributional assumptions.
Show code
set.seed(9750)boot_us_gold <- us_summer_recent |>specify(response = gold) |>generate(reps =5000, type ="bootstrap") |>calculate(stat ="mean")boot_ci_us <- boot_us_gold |>get_confidence_interval(level =0.95, type ="percentile")boot_host <- host_effect_recent_all |>drop_na(pct_increase_gold) |>specify(response = pct_increase_gold) |>generate(reps =5000, type ="bootstrap") |>calculate(stat ="mean")boot_ci_host <- boot_host |>get_confidence_interval(level =0.95, type ="percentile")boot_new <- new_sport_recent |>mutate(host_won_gold =as.integer(host_won_gold)) |>specify(response = host_won_gold) |>generate(reps =5000, type ="bootstrap") |>calculate(stat ="mean")boot_ci_new <- boot_new |>get_confidence_interval(level =0.95, type ="percentile")
Bootstrap uses 5,000 resamples via the infer package
Factor
Parametric (t-test / prop-test)
Bootstrap (percentile method)
Lower
Upper
Lower
Upper
US Baseline (golds)
29.700
50.000
31.000
49.500
Host Effect (% change)
53.700
315.200
78.900
312.800
New Sport P(gold)
0.321
0.477
0.321
0.472
The bootstrap and parametric intervals are very close for the US baseline and the new sport probability, suggesting normality is a reasonable assumption for those quantities. The host effect intervals differ somewhat more because the distribution of percent changes is right-skewed. For robustness, we use the bootstrap intervals for the host effect in the final forecast.
Show code
visualize(boot_us_gold) +shade_confidence_interval(boot_ci_us, color ="#c9a84c", fill ="#c9a84c",alpha =0.2) +geom_vline(xintercept = ci_us_gold$estimate, colour ="#1a1a2e",linewidth =1.2, linetype ="dashed") +labs(title ="Bootstrap Distribution: US Average Gold Medals per Games",subtitle =glue("95% CI: [{round(boot_ci_us$lower_ci, 1)}, {round(boot_ci_us$upper_ci, 1)}] golds | ","5,000 bootstrap resamples" ),x ="Bootstrap Mean (gold medals)", y ="Count" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"))
Bootstrap distribution for Team USA’s average gold medal count per Summer Olympics since 1960. The red shading shows the 95% confidence interval.
Extra Credit 2: GDP Per Capita as a Fourth Factor
Richer nations invest more in athlete development, training infrastructure, and sports science. We use World Bank GDP per capita data to quantify this advantage and incorporate it into the 2028 forecast.
n = 780 country-year observations | R-squared = 0.026
Term
Estimate
Std. Error
t-statistic
p-value
Lower CI
Upper CI
(Intercept)
0.665
0.183
3.642
2.89 × 10−4
0.307
1.024
log(gdp_pc)
0.090
0.020
4.518
7.20 × 10−6
0.051
0.129
The coefficient on log GDP per capita is positive and highly significant. An R-squared of 0.03 means GDP per capita alone explains a meaningful share of cross-country variation in gold medal counts. Wealthier nations really do win more.
GDP Factor for the 2028 Forecast
Show code
us_gdp_pc <- gdp_raw |>filter(iso3c =="USA") |>slice_max(year, n =1) |>pull(gdp_pc)avg_gdp_pc <- medals_with_gdp |>filter(year >=1960) |>group_by(iso_code) |>summarise(avg =mean(gdp_pc, na.rm =TRUE), .groups ="drop") |>pull(avg) |>mean(na.rm =TRUE)gdp_coef <- gdp_summary |>filter(term =="log(gdp_pc)") |>pull(estimate)# GDP multiplier: compare the US to the median GDP of past Summer host nations.# We compute this directly from gdp_matched + summer_hosts to avoid a# dependency on the validation_hosts object (which is built later in EC3).# Get ISO codes for host nations so we can join to gdp_matchedhost_iso <- medals_clean |>filter(season =="Summer") |>distinct(country_code, iso_code)past_hosts <- summer_hosts |>filter(year >=1984) |>left_join(host_iso, by =c("host_code"="country_code")) |>left_join( gdp_matched |>select(year, iso_code, gdp_pc),by =c("year", "iso_code") ) |>filter(!is.na(gdp_pc))median_host_gdp_pc <-median(past_hosts$gdp_pc, na.rm =TRUE)# Cap the multiplier at 1.10 (10% boost) to reflect the diminishing returns# of wealth at the very top of the income distributiongdp_multiplier <-min(1.10,exp(gdp_coef * (log(us_gdp_pc) -log(median_host_gdp_pc))))
With a GDP per capita of $85,000 versus the median past host nation GDP of $19,000, the model estimates a 10% GDP advantage boost for Team USA. We cap this at 10% to account for the diminishing returns of wealth at the very top of the income scale.
Task 5: Monte Carlo Forecast for LA 2028
We combine all four factors using 1,000,000 Monte Carlo draws. The forecast equation is:
The median forecast is 134 gold medals for Team USA at LA 2028, with a 95% confidence interval of 78 to 199.
Show code
tibble(golds = predicted_gold_draws) |>ggplot(aes(x = golds)) +geom_histogram(binwidth =2, fill ="#1a1a2e", colour ="white", alpha =0.85) +geom_vline(xintercept = gold_ci[2], colour ="#c9a84c", linewidth =1.2) +geom_vline(xintercept =c(gold_ci[1], gold_ci[3]),colour ="#c9a84c", linewidth =0.8, linetype ="dashed") +annotate("label", x = gold_ci[2] +6, y =Inf, vjust =1.5,label =glue("Median: {round(gold_ci[2])}"),colour ="#c9a84c", fontface ="bold", fill ="white") +labs(title ="Monte Carlo Forecast: Team USA Gold Medals at LA 2028",subtitle =glue("95% CI: [{round(gold_ci[1])}, {round(gold_ci[3])}] | ","Four-factor model | 1,000,000 simulations" ),x ="Predicted Gold Medals", y ="Simulation Count",caption ="Baseline x Host Effect x GDP Factor + New Sport Bonus" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"))
Monte Carlo simulation output. The solid red line is the median; dashed lines mark the 95% interval.
Extra Credit 3: Retrospective Model Validation
To understand how reliable the model is, we apply it to past host nations and compare predictions to what actually happened. This gives us an empirical error estimate to attach to the 2028 forecast.
Show code
validation_hosts <- host_effect |>filter(year >=1984) |>left_join( gdp_matched |>select(year, country_code, gdp_pc),by =c("year", "host_code"="country_code") ) |>filter(!is.na(gdp_pc)) |>mutate(# For validation, use the simple host-effect model without GDP to avoid# double-counting (the baseline already reflects each country's wealth level)predicted = prior_gold * (1+ avg_host_increase /100),actual = host_gold_hosting,abs_error =abs(predicted - actual),pct_error = abs_error / (actual +0.5) *100 ) |>select(year, host_code, prior_gold, actual, predicted, abs_error, pct_error) |>arrange(year)
The model’s mean absolute error across 13 past host nations is 22.7 gold medals (mean % error: 77.9%). Adding this empirical margin to the Monte Carlo confidence interval gives an adjusted 2028 forecast of 57 to 243 gold medals.
Task 6: Team USA Fundraising Fact Sheet
Show code
tibble(Factor =c("US Baseline (post-1960 average)","Home Nation Boost","GDP Per Capita Advantage","New Sport Bonus (5 sports, ~10 events)","Total Forecast" ),Impact =c(glue("~{round(us_baseline$mu)} golds per Games"),glue("+{round(us_baseline$mu * host_eff$mu / 100)} golds expected"),glue("+{round((gdp_multiplier - 1) * 100, 1)}% GDP boost"),glue("+{round(N_NEW_EVENTS * new_sport_params$mu)} golds expected"),glue("~{round(gold_ci[2])} golds (central estimate)") ),`95% Interval`=c(glue("[{round(boot_ci_us$lower_ci)}, {round(boot_ci_us$upper_ci)}]"),glue("[{round(us_baseline$mu * boot_ci_host$lower_ci / 100)}, {round(us_baseline$mu * boot_ci_host$upper_ci / 100)}]"),glue("[+{round((gdp_multiplier - 1) * 100 * 0.5, 1)}%, +{round((gdp_multiplier - 1) * 100, 1)}%]"),glue("[{round(N_NEW_EVENTS * boot_ci_new$lower_ci)}, {round(N_NEW_EVENTS * boot_ci_new$upper_ci)}]"),glue("[{round(gold_ci[1])}, {round(gold_ci[3])}]") )) |>gt() |>tab_header(title =md("**Team USA at LA 2028: Gold Medal Forecast**"),subtitle ="Four-factor model using historical Olympic data (1960 to 2024)" ) |>cols_label(Factor ="Factor", Impact ="Expected Impact") |>tab_style(style =list(cell_fill(color ="#c9a84c"), cell_text(weight ="bold")),locations =cells_body(rows =5) ) |>tab_style(style =cell_fill(color ="#f7f5ee"),locations =cells_body(rows =c(1, 3)) )
Team USA at LA 2028: Gold Medal Forecast
Four-factor model using historical Olympic data (1960 to 2024)
Factor
Expected Impact
95% Interval
US Baseline (post-1960 average)
~40 golds per Games
[31, 49]
Home Nation Boost
+79 golds expected
[32, 126]
GDP Per Capita Advantage
+10% GDP boost
[+5%, +10%]
New Sport Bonus (5 sports, ~10 events)
+4 golds expected
[3, 5]
Total Forecast
~134 golds (central estimate)
[78, 199]
The Case for Sponsoring Team USA at LA 2028
The data makes a strong case. Team USA enters the 2028 Los Angeles Games with four structural advantages stacking on top of each other.
The baseline is already elite. Since 1960, Team USA has averaged 40 gold medals per Summer Olympics more than any other nation. The floor is high: even in tough Olympics, the US rarely falls below 30 golds.
Home soil matters. Our analysis of every host nation since 1960 shows an average gold medal increase of 196% when competing at home. Crowd support, automatic qualification in every discipline, and years of national investment leading up to the event all contribute. For Team USA, that translates to roughly 79 additional golds beyond the baseline.
The new sports are tailor-made for US athletes. Baseball/softball, flag football, lacrosse, and squash are all sports where American athletes have dominated internationally. Historical data shows host nations win gold in 40% of new disciplines they introduce. With an estimated 10 gold-medal events across these sports, Team USA has a meaningful cluster of additional wins available that no other country can match.
The economic advantage is real. With a GDP per capita of roughly $85,000 well above the Olympic average of $16,000 the US investment in athlete development reflects structural wealth that translates directly into medal performance. Our regression confirms this relationship is statistically significant (p < 0.001).
Putting it all together, the central estimate is 134 gold medals for Team USA at LA 2028 (95% CI: 78 to 199). Adjusted for empirical model error based on 13 past host nations, the range is 57 to 243 golds. That is a dominant performance by any historical standard, and exactly the kind of outcome sponsors want to be associated with.
NoteAI Usage Statement
I used Claude (an AI assistant by Anthropic) to help structure and write the R code for this project, including the web scraping pipeline in Task 1, the Monte Carlo simulation in Task 5, and the bootstrap and GDP extra credit sections. I reviewed all code outputs and interpreted the results throughout. The written narrative, analysis framing, and conclusions are my own. No AI was used to write or edit non-code text, in accordance with course policy.