← Home Data

Data products, code, and reproducibility

Data

Open downloads

Canonical distribution is the versioned Zenodo deposit (DOI 10.5281/zenodo.19823584). Four download options below; pick the subset you need. MIT-licensed.

Everything (single zip) 178 MB Download
nps-open-climate-data-v1.0.0-all.zip

All four archives below in one download. Use this for offline analysis.

Daily CSVs (gzipped) 150 MB Download
nps-open-climate-data-v1.0.0-daily.zip

Raw 1980–2025 daily series per park in DAYMET / ERA5 native units (K, m, kg/m², W/m², Pa). Multipart parks ship one .csv.gz per polygon.

Park summaries (JSON) 24 MB Download
nps-open-climate-data-v1.0.0-summary.zip

Per-park summary JSONs — annual + seasonal aggregates, Mann–Kendall / Theil–Sen trends, monthly decomposition, climate stripes — plus parks.json index. Temperatures in °C.

Park boundaries (GeoJSON) 3 MB Download
nps-open-climate-data-v1.0.0-boundaries.zip

PAD-US 4.1 proclamation polygons dissolved per park, simplified to 50 m, in WGS84. Includes all_parks.geojson FeatureCollection.

Per-format usage

  • Unzip the archive — every zip extracts to a single nps-open-climate-data-v1.0.0/ folder:
    unzip nps-open-climate-data-v1.0.0-summary.zip
  • Daily CSVs are gzipped — read directly with pandas (auto-detects the .gz extension) or decompress first:
    # in Python
    import pandas as pd
    df = pd.read_csv("daily/yellowstone/yellowstone.csv.gz")
    
    # from the shell
    gunzip daily/yellowstone/yellowstone.csv.gz
  • Summary JSONs are plain JSON; load with any standard parser. Schema lives on the Methodology page.
    import json
    summary = json.load(open("summary/yellowstone.json"))
    print(summary["headline_trends"]["tmean_c"])
  • GeoJSON boundaries open natively in QGIS, geopandas, Mapbox, Leaflet, etc.
    import geopandas as gpd
    gdf = gpd.read_file("boundaries/yellowstone.geojson")

Programmatic helper (Python)

The Python package ships a small Zenodo download helper. It pulls the archive on first use, caches under ~/.cache/nps_climate_data/, and returns ready- to-use objects:

python
# Pure-stdlib helper that pulls + caches the Zenodo archives, then
# returns a pandas DataFrame for any park's daily series:
import nps_climate_data as nps

df  = nps.fetch_daily("yellowstone")     # raw daily CSV → DataFrame
sj  = nps.fetch_summary("yellowstone")   # summary JSON → dict
arc = nps.fetch_archive("boundaries")    # downloads + extracts the zip

# Caches under ~/.cache/nps_climate_data so subsequent calls are local.

For agents

Point an AI assistant at /llms.txt for a structured pointer to the dataset, download URLs, Python helpers, schema, and limitations. The Python helper (nps.fetch_daily(slug), etc.) is the simplest path for an agent to actually fetch a park's series — it handles the Zenodo download, gunzip, and parsing in one call. The same information is also exposed as schema.org/Dataset JSON-LD on this page and the home page so structured-data crawlers pick it up too.

Single-park lookups (live site)

The site also serves individual files at predictable URLs — handy for fetching one park without downloading the whole Zenodo archive.

Per-park summary JSON
/NPS-Open-Climate-Data/data/parks/<slug>.json

Single park, served live. Same shape as the Zenodo summary archive.

Per-park boundary GeoJSON
/NPS-Open-Climate-Data/data/boundaries/<slug>.geojson

PAD-US 4.1 polygon for one park, in WGS84.

All-parks boundary FeatureCollection
/NPS-Open-Climate-Data/data/boundaries/all_parks.geojson

All 63 parks in one FeatureCollection, with each park's headline warming slope exposed as feature properties.

Use responsibly

Verify before you cite

These products are open-licensed under MIT and audited internally against literature anchors and arithmetic / range / consistency checks (see Methodology → QC and docs/DATA_QC.md). They are derived products — polygon-averaged ERA5-Land and DAYMET fields aggregated over PAD-US 4.1 boundaries, not station observations.

  1. Re-derive the value. Re-run the pipeline from raw, or independently compute the variable of interest from DAYMET / ERA5-Land directly, and confirm it matches.
  2. Cross-check stations. Compare against NOAA NCEI, NWS, or NPS monitoring records for the specific park and variable.
  3. Correct your statistics. Apply autocorrelation handling (Hamed–Rao MK) and multiple- comparisons correction (Benjamini–Hochberg FDR or Bonferroni). The deployed significance flags do neither.
  4. Read the variable-specific limitations. See methodology → limitations, especially pet_mm (ERA5-Land overestimate vs FAO Penman–Monteith), high-elevation cold bias, and area- naive multipart averaging.

Cite the methodology page, run your own QC for your use case, and treat the published numbers as a starting point.

Reproduce

End-to-end pipeline

The full auth → EE export → Drive pull → analysis → site build flow lives in pipeline.ipynb. This is the condensed terminal version.

bash
git clone https://github.com/anniebritton/NPS-Open-Climate-Data
cd NPS-Open-Climate-Data
pip install -e .

# 1. Submit serverless EE export tasks (needs a GCP project with EE enabled).
#    Runs on Google's servers; close the laptop after submit.
python -c "import ee; ee.Authenticate(auth_mode='localhost'); ee.Initialize(project='YOUR_PROJECT')"
python scripts/01_export_all_parks.py --start 1980-01-01

# 2. When the tasks show COMPLETED at code.earthengine.google.com/tasks,
#    pull the CSVs from Drive (needs gcloud ADC with Drive scope).
python scripts/07_download_from_drive.py --drive-folder NPS_Climate_Data

# 3. Aggregate, run trend tests, write site JSON.
python scripts/02_build_site_data.py

# 4. Extract real PAD-US boundaries from a local v4.1 GDB (optional —
#    committed boundaries are already in the repo).
python scripts/06_extract_padus_from_gdb.py
python scripts/05_generate_boundaries.py

# 5. Build and preview the static site locally.
cd site && npm install && npm run dev
API

Programmatic access

Direct Python access via the nps_climate_data package after pip install -e .. EE credentials must already be initialised.

python
import ee, nps_climate_data as nps
ee.Initialize()

# One park, one date range:
df = nps.get_data("Glacier National Park", "2020-01-01", "2025-01-01")

# Full-history fetch with multipart handling (dict of sub-unit -> DataFrame):
per_unit = nps.get_park_data("saguaro")
Cite

Suggested citation

Britton, A., & Pritchard, I. (2026). NPS Open Climate Data v1.0.0: Pre-processed climate trends for all 63 US National Parks [Data set]. Zenodo. https://doi.org/10.5281/zenodo.19823584