09 Exercises 0: ERA5 Data Download

UW Geospatial Data Analysis
CEE498/CEWA599
David Shean

Climate reanalysis

Nice introduction: https://climate.copernicus.eu/climate-reanalysis

“Climate reanalyses combine past observations with models to generate consistent time series of multiple climate variables. Reanalyses are among the most-used datasets in the geophysical sciences. They provide a comprehensive description of the observed climate as it has evolved during recent decades, on 3D grids at sub-daily intervals. “

ERA5

ERA5 = “ECMWF ReAnalysis 5”
ECMWF = “European Centre for Medium-Range Weather Forecasts”

https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5

“ERA5 provides hourly estimates of a large number of atmospheric, land and oceanic climate variables. The data cover the Earth on a 30km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km.”

“ERA5 combines vast amounts of historical observations into global estimates using advanced modelling and data assimilation systems.”

Variables

Hundreds of output variables for each hourly timestep. See a list of all of the available variables:

Resolution

The ERA5 HRES (High Resolution) data have a native resolution of 0.28125 degrees (31km)

The ERA5-Land data have a native resolution of 9 km (~0.08°)

How many grid cells are required to store one variable (like temperature) for full 72 year record at hourly resolution?

#Space
s = 360*180*4*4*137
s
142041600
#Time
t = 72*365.25*24
t
631152.0
s*t
89649839923200.0
f'{s*t:e}'
'8.964984e+13'

Data Availability

From CDS (Climate Data Store)

For future reference, you can access the ERA5 data directly! The CDS API allows you to request subsets of ERA5 products for desired spatial extent, time periods, time intervals, etc.:

Some commonly used products are also available on Amazon S3

Shortcut: download sample datasets

We could submit requests directly from the CDS API, but you will need to create an account and use a unique API key. The server-side processing and download will require at least 5-40 minutes per dataset.

For this lab, I submitted some requests to prepare sample ERA5 datasets. I then processed Please run the 09_NDarrays_xarray_ERA5_Part0_download.ipynb notebook to download and prepare these files!.

Zenodo

Zenodo is a great, free, permanent data archiving solution: https://about.zenodo.org/

Lab09 Zenodo record

  • https://zenodo.org/record/6302343

  • Three main files needed for the Lab09 notebooks. Original datasets from CDS are also archived.

    • Notebook 1: ‘climatology_0.25g_ea_2t.nc’, ‘1month_anomaly_Global_ea_2t.nc’

    • Notebook 2: ‘WA_ERA5-Land_hourly_1950-2022_6hr.nc’

Check disk space!

  • Before running, open a terminal on the hub and run the following command df -h ~. Should report something like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf         50G   41G  9.7G  81% /home/jovyan
  • You will need ~4.5 GB available for these data products

  • If you don’t have that, you can go back and delete some of the products from previous labs that are no longer needed, or can be easily downloaded again

!df -h ~
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf         50G   37G   14G  74% /home/jovyan
import os
outdir = 'era5_data'
if not os.path.exists(outdir):
    os.makedirs(outdir)
base_url = 'https://zenodo.org/record/6302343/files/'
fn_list = ['climatology_0.25g_ea_2t.nc', '1month_anomaly_Global_ea_2t.nc', 'WA_ERA5-Land_hourly_1950-2022_6hr.nc']
url_list = [base_url+fn for fn in fn_list]
#For parallel download from command line:
#url_list_str = ' '.join(url_list)
url_list
for url in url_list:
    !wget -nc -P {outdir} {url}