09 Exercises 0: ERA5 Data Download¶

UW Geospatial Data Analysis
CEE498/CEWA599
David Shean

Climate reanalysis¶

Nice introduction: https://climate.copernicus.eu/climate-reanalysis

“Climate reanalyses combine past observations with models to generate consistent time series of multiple climate variables. Reanalyses are among the most-used datasets in the geophysical sciences. They provide a comprehensive description of the observed climate as it has evolved during recent decades, on 3D grids at sub-daily intervals. “

ERA5¶

ERA5 = “ECMWF ReAnalysis 5”
ECMWF = “European Centre for Medium-Range Weather Forecasts”

https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5

“ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the Earth on a 30km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km.”

“ERA5 combines vast amounts of historical observations into global estimates using advanced modelling and data assimilation systems.”

Variables¶

Hundreds of output variables for each hourly timestep. See a list of all of the available variables:

https://apps.ecmwf.int/codes/grib/param-db

Resolution¶

The ERA5 HRES (High Resolution) data have a native resolution of 0.28125 degrees (31km)

https://confluence.ecmwf.int/display/CKB/ERA5:+What+is+the+spatial+reference

The ERA5-Land data have a native resolution of 9 km (~0.08°)

https://confluence.ecmwf.int/display/CKB/ERA5-Land:+data+documentation

How many grid cells are required to store one variable (like temperature) for full 72 year record at hourly resolution?¶

#Space
s = 360*180*4*4*137
s

142041600

#Time
t = 72*365.25*24
t

631152.0

s*t

89649839923200.0

f'{s*t:e}'

'8.964984e+13'

Data Availability¶

From CDS (Climate Data Store)¶

For future reference, you can access the ERA5 data directly! The CDS API allows you to request subsets of ERA5 products for desired spatial extent, time periods, time intervals, etc.:

Some commonly used products are also available on Amazon S3¶

https://registry.opendata.aws/ecmwf-era5/

Shortcut: download sample datasets¶

We could submit requests directly from the CDS API, but you will need to create an account and use a unique API key. The server-side processing and download will require at least 5-40 minutes per dataset.

For this lab, I submitted some requests to prepare sample ERA5 datasets. I then processed Please run the 09_NDarrays_xarray_ERA5_Part0_download.ipynb notebook to download and prepare these files!.

Zenodo¶

Zenodo is a great, free, permanent data archiving solution: https://about.zenodo.org/

Lab09 Zenodo record¶

https://zenodo.org/record/6302343
Three main files needed for the Lab09 notebooks. Original datasets from CDS are also archived.
- Notebook 1: ‘climatology_0.25g_ea_2t.nc’, ‘1month_anomaly_Global_ea_2t.nc’
- Notebook 2: ‘WA_ERA5-Land_hourly_1950-2022_6hr.nc’

Check disk space!¶

Before running, open a terminal on the hub and run the following command df -h ~. Should report something like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf         50G   41G  9.7G  81% /home/jovyan

You will need ~4.5 GB available for these data products
If you don’t have that, you can go back and delete some of the products from previous labs that are no longer needed, or can be easily downloaded again

!df -h ~

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf         50G   37G   14G  74% /home/jovyan

import os

outdir = 'era5_data'

if not os.path.exists(outdir):
    os.makedirs(outdir)

base_url = 'https://zenodo.org/record/6302343/files/'

fn_list = ['climatology_0.25g_ea_2t.nc', '1month_anomaly_Global_ea_2t.nc', 'WA_ERA5-Land_hourly_1950-2022_6hr.nc']

url_list = [base_url+fn for fn in fn_list]
#For parallel download from command line:
#url_list_str = ' '.join(url_list)
url_list

for url in url_list:
    !wget -nc -P {outdir} {url}

Geospatial Data Analysis with Python

09 Exercises 0: ERA5 Data Download

Contents