01 Sources Three open datasets, fused
Temperature, precipitation, snowpack and radiation come from
two reanalysis-grade grids; park shapes come from USGS. Every
source is free and publicly re-downloadable.
DAYMET v4 1 km gridded daily observations
NASA/ORNL, 1980–present. Daily values are interpolated from
ground-station observations onto a 1 km grid — not a
reanalysis or model output. Covers CONUS, Hawaii, Puerto
Rico. Bands: prcp (mm), tmax/tmin (°C),
srad (W/m²), swe (kg/m²), vp (Pa).
ERA5-Land ~11 km global reanalysis
ECMWF/Copernicus, 1950–present. Fills gaps DAYMET can't:
parks in Alaska, American Samoa, and Virgin Islands,
plus evaporation, wind, and snow depth.
USGS PAD-US 4.1 Official park boundaries
Matched by Unit_Nm with alias handling for
Alaskan "National Park and Preserve" units, dissolved per
park, simplified to 50 m in Albers, then reprojected to WGS84.
02 Pipeline Reduced over every pixel in the boundary
For each park we load the boundary, split disjoint sub-units
(Saguaro's Rincon and Tucson districts; the five islands of
Channel Islands), and reduce each dataset over the geometry
with ee.Reducer.mean() at native grid scale —
1 km DAYMET, 11 km ERA5-Land — using bestEffort=True
and maxPixels=1e13.
Exports run as serverless Earth Engine batch tasks
(ee.batch.Export.table.toDrive) with an explicit
selectors list so the CSV writer emits one column
per band. End dates are passed to filterDate
exclusively, the EE convention.
Flow bands keep their sum. Variables ending
in _sum (precipitation, snowfall, snowmelt,
evapotranspiration) are preserved as daily totals per
ECMWF/Copernicus convention; everything else is a daily mean.
03 Units From SI to human-readable
All conversions happen in the display layer — raw pipeline
outputs stay strictly in SI units (the international
scientific system: Kelvin for temperature, metres for depth,
kg/m² for mass) so Parquet/CSV consumers get unambiguous values.
- ERA5 temperaturesK − 273.15 → °C
- ERA5 precipitation, snowfall, snowmeltm × 1000 → mm
- ERA5 evaporation fluxessign-flip, × 1000 → positive mm/day
- DAYMET SWEkg/m² ≡ mm liquid water
- Site display (temperature only)°C → °F
04 Statistics Non-parametric trend tests
The methodology favours tests that don't assume normality or
homoscedastic residuals — a conservative choice for
short-to-medium climate series, which often have skewed
distributions (precipitation, snow) and mild autocorrelation.
Mann–Kendall5,6
Non-parametric rank test for monotonic trend. Uses tie
correction and the normal-approximation p-value, which is
standard for n ≥ 10. Applied to the annual series. Does
not correct for serial autocorrelation; for
series longer than a few decades, the Hamed–Rao
variant7 is the
recommended substitute (see Limitations below).
Theil–Sen8,9
Slope estimator: the median of all pairwise year-over-year
slopes. Robust to outliers and to non-normal residuals. We
prefer it to OLS for skewed climate variables (precip,
snow). The reported slope tends to be more conservative
than an OLS fit on the same series.
Annual aggregation
Calendar year. Sums for flow variables
(precipitation, snow, ET); means for state
variables (temperature, radiation, snow depth, vapour
pressure). DAYMET v4 and ERA5-Land are gap-free historical
records, so this aggregation drops only the year currently
in progress at export time — the < 330-day cutoff
prevents a partial calendar year from contributing a
biased annual aggregate.
Seasonal aggregation
Standard meteorological seasons: DJF / MAM / JJA / SON.
December is assigned to the following year's DJF so that
each winter is contiguous in calendar time.
Anomalies
Reference-period mean and σ computed over 1981–2010 (WMO
30-year normal). z-scores drive the climate stripes on
the home and park pages.
Monthly decomposition
Classical additive decomposition at monthly resolution:
12-month centred rolling mean as the long-term trend, plus
per-period month-of-year climatologies (1981–1995, 1996–
2010, 2011–2025) with a 10th–90th percentile inter-annual
envelope. Classical (rather than STL or X-13) keeps the
method dependency-free and reproducible from pandas alone;
the trade-off is that the seasonal component is fixed
within each period rather than allowed to evolve smoothly.
Why 1981–2010?
The WMO defines "climate normals" as 30-year averages updated
each decade; 1981–2010 is the most recent normal that fits
entirely inside the 1980-onward DAYMET record, so every park
gets a baseline computed from the same period regardless of
dataset coverage. A more recent normal (1991–2020) would
shorten the comparison window for trends starting in 1980,
and an older one (1971–2000) would precede the start of the
DAYMET record. The chosen window is slightly back-loaded
inside a 1980–present series, which compresses early-period
anomalies — a deliberate trade for keeping every park on the
same yardstick.
Longer-horizon inference needs autocorrelation handling.
For series much longer than a few decades, apply the Hamed–Rao
Mann–Kendall variant
4
or an equivalent correction.
05 Edge cases Disjoint park geometries
A few parks span geographically separate polygons. For these
we compute one series per polygon (using
ee.Geometry.geometries()) and a union-level
park-wide series. Both are exported; per-part series live in
the downloadable summary JSON.
06 QC Quality control
Pipeline outputs are audited against four classes of check —
unit / range bands per variable, annual-aggregation arithmetic
recompute, Mann–Kendall ↔ Theil–Sen consistency, and an
external warming benchmark for 20 parks across climate zones
drawn from Gonzalez et al. (2018), the Greater Yellowstone
Climate Assessment, NPS climate-change pages, and the NOAA /
NCEI State Climate Summaries. Each finding is tagged
PASS / WARN / FAIL with a documented rubric.
The audit lives at
docs/DATA_QC.md
on GitHub and is regenerated by
scripts/qc_pass.py against
site/public/data/parks/.
07 Limitations Known limitations and biases
Tagged by category — dataset (upstream
DAYMET / ERA5-Land properties), statistics
(trend-test caveats), method (choices in
this pipeline), and coverage (where the
grid doesn't reach).
Dataset Complex-terrain bias
DAYMET v4 has cold-bias at high elevation; ERA5-Land's
~11 km grid smooths elevation-driven temperature
gradients. The trend signal is more robust than the
absolute level.
Dataset Not station observations
Polygon means are reanalysis / interpolated-grid values
reduced over the entire park boundary. They will not
match a specific weather station inside the park.
Dataset ERA5-Land PET runs high
The dataset's bare-soil assumption inflates
pet_mm by ~1.5–2× vs FAO Penman–Monteith.
Trend direction is informative; absolute level is biased.
aet_mm is unaffected.
Statistics No autocorrelation correction
Lag-1 autocorrelation depresses MK p-values. The deployed
normal-approximation is fine for n ≈ 45 with mild
dependence; longer-horizon work needs the Hamed–Rao
variant7.
Statistics No multiple-comparisons correction
~14 variables × 63 parks ≈ 880 tests; ~5% will be
false-positive at p < 0.05 under the null.
Apply Benjamini–Hochberg FDR10
or Bonferroni for inference.
Method Multipart parks are area-naive
The union series for disjoint-geometry parks (Saguaro,
Channel Islands, Kings Canyon, Redwood, American Samoa)
averages per-polygon values without area weighting.
Per-part series are exposed in the summary JSON.
Method 1981–2010 baseline is back-loaded
WMO-standard, but back-loaded inside a 1980–present
record, which compresses early-period anomalies. The
1991–2020 normal would shorten the comparison window
further.
Method EE bestEffort sub-sampling
The EE export uses bestEffort=True with
maxPixels=1e13. For very large polygons
(e.g. Wrangell-St. Elias), it may sub-sample rather than
aggregate every pixel; the spatial mean still converges.
Method Spatial averaging smooths gradients
Park-wide means smooth within-park heterogeneity
(coastal vs interior, north vs south slope). For
pixel-level series, use Earth Engine directly with the
package's helpers.
Coverage Land-only grid coverage
Three small island parks (American Samoa, Dry Tortugas,
Virgin Islands) fall inside sea-masked ERA5-Land pixels
and aren't covered by DAYMET. The site labels them
"outside land-grid coverage" and renders no charts.
08 Citations For reproducibility
-
Thornton, M. M., Shrestha, R., Wei, Y., Thornton, P. E., Kao,
S.-C., & Wilson, B. E. (2022). Daymet: Daily Surface
Weather Data on a 1-km Grid for North America, Version 4 R1.
ORNL DAAC.
https://doi.org/10.3334/ORNLDAAC/2129
-
Muñoz Sabater, J. (2019). ERA5-Land hourly data from
1950 to present. Copernicus Climate Change Service
(C3S) Climate Data Store.
https://doi.org/10.24381/cds.e2161bac
-
U.S. Geological Survey, Gap Analysis Project. PAD-US 4.1:
Protected Areas Database of the United States. https://doi.org/10.5066/P96WBCHS
-
World Meteorological Organization. (2017). WMO
Guidelines on the Calculation of Climate Normals (WMO-No.
1203). Geneva: WMO.
-
Mann, H. B. (1945). Nonparametric tests against trend.
Econometrica, 13(3), 245–259.
https://doi.org/10.2307/1907187
-
Kendall, M. G. (1975). Rank Correlation Methods (4th ed.).
London: Charles Griffin.
-
Hamed, K. H., & Rao, A. R. (1998). A modified Mann–Kendall
trend test for autocorrelated data. Journal of Hydrology,
204(1–4), 182–196.
https://doi.org/10.1016/S0022-1694(97)00125-X
-
Theil, H. (1950). A rank-invariant method of linear and
polynomial regression analysis, I, II, III.
Proceedings of the Koninklijke Nederlandse Akademie van
Wetenschappen, Series A 53, 386–392, 521–525, 1397–1412.
-
Sen, P. K. (1968). Estimates of the regression coefficient
based on Kendall's tau. Journal of the American
Statistical Association, 63(324), 1379–1389.
https://doi.org/10.1080/01621459.1968.10480934
-
Benjamini, Y., & Hochberg, Y. (1995). Controlling the
false discovery rate: A practical and powerful approach to
multiple testing. Journal of the Royal Statistical
Society B, 57(1), 289–300.
https://doi.org/10.1111/j.2517-6161.1995.tb02031.x