09 Demo: Multidimensional (N-d) arrays, xarray, ERA5 climate reanalysis data#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

Introduction#

This week we are going to do some basic analysis of climate reanalysis data. This could be useful for some of your projects, especially if considering time series data.

We will use a few different products from the state-of-the-art global ERA5 reanalysis, which currently span 1950-present with hourly timestep at up to a 9 km resolution.

We will use xarray to open, combine, analyze and plot the data.

xarray#

Take a moment to review this high-level introduction:

“xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.

Xarray is inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing.”

Remember back to Lab03, when you tried to extract a column of elevation values from a 2D NumPy array, with something like myarray[:,4]? And then how much easier it was to do the same thing with a labeled pandas DataFrame mydf['glas_z']? Same deal here, just extended beyond 2D.

Why?

Why not?

  • General means complicated

  • Steep learning curve, esp for new users unfamiliar with Pandas

  • Sometimes overkill for simple problems

xarray data model overview#

So, what’s an nD array? (https://numpy.org/doc/stable/reference/arrays.ndarray.html). You’ve been using them all quarter, but mostly 1D and 2D NumPy arrays.

As with many of the packages we’ve covered this quarter, vocabulary can be one of the biggest blocks to learning. Let’s discuss.

xarray image with labels (http://xarray.pydata.org/en/latest/data-structures.html#dataset)

Comparison with Pandas#

Pandas is very good at handling 2D tabular datasets (e.g., csv with columns and rows, time series of met station variables from a single station [T, precip, etc]) or a single variable across multiple stations.

xarray extends the Pandas functionality to support 3+ dimensions (e.g., time series of 2D rasters).

xarray is to Pandas…#

  • xarray DataArray : Pandas DataSeries

  • xarray DataSet : Pandas DataFrame

Terminology#

DataArray#

Four essential pieces:

  • values: a numpy.ndarrays with actual data values (e.g., (‘2-m air temperature (t2m)’, ‘Total precipitation (tp)’)

  • dims: dimension names for each axis (e.g., (‘lon’, ‘lat’, ‘time’))

  • coords: a dict-like container of arrays that label each value in each dimension (e.g., 1-D array of x coordinates, 1-D array of y coordiantes, datetime objects or strings)

  • attrs: an OrderedDict containing additional metadata (attributes)

Why do we need both dims and coords?#

“Dimensions provide names that xarray uses instead of the axis argument found in many numpy functions. Coordinates enable fast label based indexing and alignment, building on the functionality of the index found on a pandas DataFrame or Series.”

Dataset#

Notes:

  • One value in one of the contained arrays (say a single temperature measurement) usually has multiple coordinates (‘lon’, ‘lat’, ‘time’)

Useful xarray examples and references#

netCDF format#

Much of the xarray design and functionality is derived from the NetCDF project:

“NetCDF (Network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. It is also a community standard for sharing scientific data.” https://www.unidata.ucar.edu/software/netcdf/

“Data in netCDF format is:

  • Self-Describing. A netCDF file includes information about the data it contains.

  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.

  • Scalable. Small subsets of large datasets in various formats may be accessed efficiently through netCDF interfaces, even from remote servers.

  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.

  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.

  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.”

“Commonly used in climatology, meteorology and oceanography applications (e.g., weather forecasting, climate change) and GIS applications.” https://en.wikipedia.org/wiki/NetCDF

import xarray as xr
import os
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Demo: Create simple DataArray#

from numpy.random import RandomState
prng = RandomState(1234567890)
#data = np.random.rand(4, 3)
#Use fixed instance of random state to reduce diff as notebook is modified in future years
data = prng.rand(4, 3)
data
array([[0.61879477, 0.59162363, 0.88868359],
       [0.8916548 , 0.45756748, 0.77818808],
       [0.26706377, 0.99610621, 0.54009489],
       [0.53752161, 0.40099938, 0.70540579]])
#xr.DataArray?
xr.DataArray(data)
<xarray.DataArray (dim_0: 4, dim_1: 3)>
array([[0.61879477, 0.59162363, 0.88868359],
       [0.8916548 , 0.45756748, 0.77818808],
       [0.26706377, 0.99610621, 0.54009489],
       [0.53752161, 0.40099938, 0.70540579]])
Dimensions without coordinates: dim_0, dim_1
locs = ["WA", "OR", "CA"]
times = pd.date_range("2000-01-01", periods=4)
times
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='D')
foo = xr.DataArray(data, coords=[times, locs], dims=["time", "space"])
foo
<xarray.DataArray (time: 4, space: 3)>
array([[0.61879477, 0.59162363, 0.88868359],
       [0.8916548 , 0.45756748, 0.77818808],
       [0.26706377, 0.99610621, 0.54009489],
       [0.53752161, 0.40099938, 0.70540579]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) <U2 'WA' 'OR' 'CA'

Indexing#

foo.values
array([[0.61879477, 0.59162363, 0.88868359],
       [0.8916548 , 0.45756748, 0.77818808],
       [0.26706377, 0.99610621, 0.54009489],
       [0.53752161, 0.40099938, 0.70540579]])
foo.shape
(4, 3)
foo[0,0]
<xarray.DataArray ()>
array(0.61879477)
Coordinates:
    time     datetime64[ns] 2000-01-01
    space    <U2 'WA'
foo[0, :]
<xarray.DataArray (space: 3)>
array([0.61879477, 0.59162363, 0.88868359])
Coordinates:
    time     datetime64[ns] 2000-01-01
  * space    (space) <U2 'WA' 'OR' 'CA'
foo.loc['2000-01-03']
<xarray.DataArray (space: 3)>
array([0.26706377, 0.99610621, 0.54009489])
Coordinates:
    time     datetime64[ns] 2000-01-03
  * space    (space) <U2 'WA' 'OR' 'CA'
foo.loc[:,'WA']
<xarray.DataArray (time: 4)>
array([0.61879477, 0.8916548 , 0.26706377, 0.53752161])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
    space    <U2 'WA'
#This doesn't work
#foo.loc['WA', '2000-01-03']
#Axis order matters!
#foo.loc['2000-01-03', 'WA']
foo.sel(time='2000-01-03', space='WA')
<xarray.DataArray ()>
array(0.26706377)
Coordinates:
    time     datetime64[ns] 2000-01-03
    space    <U2 'WA'
foo.sel(space='WA', time='2000-01-03')
<xarray.DataArray ()>
array(0.26706377)
Coordinates:
    time     datetime64[ns] 2000-01-03
    space    <U2 'WA'
foo.sel(space=['WA','OR'], time='2000-01-03')
<xarray.DataArray (space: 2)>
array([0.26706377, 0.99610621])
Coordinates:
    time     datetime64[ns] 2000-01-03
  * space    (space) <U2 'WA' 'OR'
#No iloc, but can select based on integer index
foo.isel(time=3, space=1)
<xarray.DataArray ()>
array(0.40099938)
Coordinates:
    time     datetime64[ns] 2000-01-04
    space    <U2 'OR'

Demo: Create xarray DataSet from SNOTEL data#

snotel_datadir = '../08_Vector_TimeSeries_SNOTEL/snotel_data'
#SNOTEL sites (with lat/lon coordiantes) for CONUS
sites_fn = os.path.join(snotel_datadir, 'snotel_conus_sites.json')
#singlesite_pkl_fn = 'SNOTEL-SNWD_D_679_WA_SNTL.pkl'
#Snow depth DataFrame for CONUS sites
allsites_SNWD_pkl_fn = os.path.join(snotel_datadir, 'SNOTEL-SNWD_D_CONUS_all.pkl')
#Snow water equivalent DataFrame for WA
allsites_WTEQ_pkl_fn = os.path.join(snotel_datadir, 'SNOTEL-WTEQ_D_WA.pkl')
sites_gdf_all = gpd.read_file(sites_fn).set_index('index')
allsites_snwd_df = pd.read_pickle(allsites_SNWD_pkl_fn).dropna(axis=0, how='all')
allsites_wteq_df = pd.read_pickle(allsites_WTEQ_pkl_fn).dropna(axis=0, how='all')
sites_gdf_all.head()
code name network elevation_m county state pos_accuracy_m beginDate endDate HUC HUD TimeZone actonId shefId stationTriplet isActive HUC2 HUC6 geometry
index
SNOTEL:301_CA_SNTL 301_CA_SNTL Adin Mtn SNOTEL 1886.712036 Modoc California 0 10/1/1983 12:00:00 AM 1/1/2100 12:00:00 AM 180200021403 18020002 -8.0 20H13S ADMC1 301:CA:SNTL True 18 180200 POINT (-120.79192 41.23583)
SNOTEL:907_UT_SNTL 907_UT_SNTL Agua Canyon SNOTEL 2712.719971 Kane Utah 0 10/1/1994 12:00:00 AM 1/1/2100 12:00:00 AM 160300020301 16030002 -8.0 12M26S AGUU1 907:UT:SNTL True 16 160300 POINT (-112.27118 37.52217)
SNOTEL:916_MT_SNTL 916_MT_SNTL Albro Lake SNOTEL 2529.840088 Madison Montana 0 9/1/1996 12:00:00 AM 1/1/2100 12:00:00 AM 100200050701 10020005 -8.0 11D28S ABRM8 916:MT:SNTL True 10 100200 POINT (-111.95902 45.59723)
SNOTEL:908_WA_SNTL 908_WA_SNTL Alpine Meadows SNOTEL 1066.800049 King Washington 0 9/1/1994 12:00:00 AM 1/1/2100 12:00:00 AM 171100100501 17110010 -8.0 21B48S APSW1 908:WA:SNTL True 17 171100 POINT (-121.69847 47.77957)
SNOTEL:302_OR_SNTL 302_OR_SNTL Aneroid Lake #2 SNOTEL 2255.520020 Wallowa Oregon 0 10/1/1980 12:00:00 AM 1/1/2100 12:00:00 AM 170601050101 17060105 -8.0 17D02S ANRO3 302:OR:SNTL True 17 170601 POINT (-117.19258 45.21328)
sites_gdf_all.plot();
../../_images/09_NDarrays_xarray_ERA5_demo_35_0.png
allsites_snwd_df.tail()
SNOTEL:301_CA_SNTL SNOTEL:907_UT_SNTL SNOTEL:916_MT_SNTL SNOTEL:908_WA_SNTL SNOTEL:302_OR_SNTL SNOTEL:1000_OR_SNTL SNOTEL:303_CO_SNTL SNOTEL:1030_CO_SNTL SNOTEL:304_OR_SNTL SNOTEL:306_ID_SNTL ... SNOTEL:872_WY_SNTL SNOTEL:873_OR_SNTL SNOTEL:874_CO_SNTL SNOTEL:875_WY_SNTL SNOTEL:876_MT_SNTL SNOTEL:877_AZ_SNTL SNOTEL:1228_UT_SNTL SNOTEL:1197_UT_SNTL SNOTEL:878_WY_SNTL SNOTEL:1033_CO_SNTL
datetime
2023-02-19 00:00:00+00:00 42.0 33.0 48.0 78.0 55.0 76.0 8.0 60.0 44.0 68.0 ... 27.0 48.0 88.0 29.0 25.0 26.0 39.0 37.0 35.0 63.0
2023-02-20 00:00:00+00:00 41.0 32.0 47.0 84.0 56.0 75.0 8.0 62.0 44.0 70.0 ... 28.0 49.0 88.0 33.0 26.0 24.0 39.0 36.0 37.0 68.0
2023-02-21 00:00:00+00:00 40.0 31.0 46.0 86.0 55.0 73.0 10.0 61.0 43.0 71.0 ... 29.0 48.0 86.0 42.0 33.0 21.0 39.0 35.0 42.0 68.0
2023-02-22 00:00:00+00:00 40.0 32.0 53.0 NaN 56.0 83.0 6.0 NaN 51.0 76.0 ... 39.0 52.0 86.0 45.0 35.0 19.0 44.0 34.0 46.0 70.0
2023-02-23 00:00:00+00:00 44.0 38.0 52.0 91.0 56.0 90.0 8.0 67.0 52.0 78.0 ... NaN 57.0 98.0 44.0 33.0 23.0 45.0 41.0 45.0 73.0

5 rows × 806 columns

allsites_wteq_df.tail()
SNOTEL:908_WA_SNTL SNOTEL:990_WA_SNTL SNOTEL:352_WA_SNTL SNOTEL:1080_WA_SNTL SNOTEL:1107_WA_SNTL SNOTEL:375_WA_SNTL SNOTEL:376_WA_SNTL SNOTEL:942_WA_SNTL SNOTEL:1109_WA_SNTL SNOTEL:1085_WA_SNTL ... SNOTEL:1012_WA_SNTL SNOTEL:817_WA_SNTL SNOTEL:899_WA_SNTL SNOTEL:824_WA_SNTL SNOTEL:1171_WA_SNTL SNOTEL:832_WA_SNTL SNOTEL:841_WA_SNTL SNOTEL:974_WA_SNTL SNOTEL:909_WA_SNTL SNOTEL:863_WA_SNTL
datetime
2023-02-21 00:00:00+00:00 38.1 24.6 9.8 40.0 33.3 16.4 16.3 16.6 0.2 36.9 ... 61.5 22.2 22.8 24.1 24.2 12.1 9.2 20.1 24.7 17.2
2023-02-22 00:00:00+00:00 37.7 24.8 10.2 40.5 34.2 16.7 16.9 16.4 1.1 37.2 ... 62.9 22.2 23.6 24.4 24.3 12.1 9.4 20.9 25.3 17.8
2023-02-23 00:00:00+00:00 37.5 24.6 10.4 40.4 34.3 16.7 16.9 16.7 1.5 37.0 ... 63.6 22.2 23.5 24.2 24.2 12.4 10.1 21.0 25.3 17.9
2023-02-24 00:00:00+00:00 37.6 24.4 10.4 40.2 34.2 16.7 16.7 16.3 1.6 37.0 ... 63.5 22.0 23.5 24.1 24.1 12.4 10.2 20.9 25.2 17.8
2023-02-25 00:00:00+00:00 37.8 24.5 10.4 40.3 34.3 16.7 16.7 16.5 1.6 37.2 ... 63.5 22.1 23.4 24.3 24.1 12.6 10.4 21.0 25.4 17.8

5 rows × 74 columns

Note difference in number of records and columns#

  • Some stations have one but not the other

  • WTEQ extends farther back in time

Convert to xarray DataArrays#

#Doesn't quite work for our sites
#sites_gdf_all.to_xarray()
#Doesn't quite work for our time series
#allsites_snwd_df.to_xarray()
#Function to prepare xarray DataArray for SNOTEL sites and time series
def get_DataArray(df, sites, name="SNWD_D"):
    valid_sites = sites.loc[df.columns]
    
    site_id = valid_sites.index.values
    lon = valid_sites.geometry.x.values
    lat = valid_sites.geometry.y.values
    elev = valid_sites.elevation_m.values
    site_name = valid_sites.name.values
    
    #da = xr.DataArray(df, dims=("time", "site_id"), name=name)
    da = xr.DataArray(df, dims=("time", "site_id"), coords=(df.index.values, site_id), name=name)
    #For some reason, the times are not read as datetime64 objects, so reassign
    da["time"] = df.index.values
    da = da.assign_coords(lon=("site_id", lon), lat=("site_id", lat), elev=("site_id", elev))
    return da
snwd_da = get_DataArray(allsites_snwd_df, sites_gdf_all, name="SNWD_D")
snwd_da
<xarray.DataArray 'SNWD_D' (time: 10807, site_id: 806)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [40., 31., 46., ..., 35., 42., 68.],
       [40., 32., 53., ..., 34., 46., 70.],
       [44., 38., 52., ..., 41., 45., 73.]])
Coordinates:
  * time     (time) datetime64[ns] 1984-10-01 1984-10-02 ... 2023-02-23
  * site_id  (site_id) object 'SNOTEL:301_CA_SNTL' ... 'SNOTEL:1033_CO_SNTL'
    lon      (site_id) float64 -120.8 -112.3 -112.0 ... -112.8 -109.8 -106.6
    lat      (site_id) float64 41.24 37.52 45.6 47.78 ... 37.75 43.93 40.79
    elev     (site_id) float64 1.887e+03 2.713e+03 ... 2.545e+03 2.847e+03

Add some attributes#

snwd_da.attrs['description'] = 'SNOTEL snow depth measurements (SNWD_D)'
snwd_da.attrs['units'] = 'inches'
snwd_da
<xarray.DataArray 'SNWD_D' (time: 10807, site_id: 806)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [40., 31., 46., ..., 35., 42., 68.],
       [40., 32., 53., ..., 34., 46., 70.],
       [44., 38., 52., ..., 41., 45., 73.]])
Coordinates:
  * time     (time) datetime64[ns] 1984-10-01 1984-10-02 ... 2023-02-23
  * site_id  (site_id) object 'SNOTEL:301_CA_SNTL' ... 'SNOTEL:1033_CO_SNTL'
    lon      (site_id) float64 -120.8 -112.3 -112.0 ... -112.8 -109.8 -106.6
    lat      (site_id) float64 41.24 37.52 45.6 47.78 ... 37.75 43.93 40.79
    elev     (site_id) float64 1.887e+03 2.713e+03 ... 2.545e+03 2.847e+03
Attributes:
    description:  SNOTEL snow depth measurements (SNWD_D)
    units:        inches
#Prepare a separate Data Array for the WTEQ_D data
wteq_da = get_DataArray(allsites_wteq_df, sites_gdf_all, name="WTEQ_D")
wteq_da
<xarray.DataArray 'WTEQ_D' (time: 16219, site_id: 74)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ...,
       [37.5, 24.6, 10.4, ..., 21. , 25.3, 17.9],
       [37.6, 24.4, 10.4, ..., 20.9, 25.2, 17.8],
       [37.8, 24.5, 10.4, ..., 21. , 25.4, 17.8]])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:908_WA_SNTL' ... 'SNOTEL:863_WA_SNTL'
    lon      (site_id) float64 -121.7 -121.3 -120.7 ... -123.4 -121.8 -121.4
    lat      (site_id) float64 47.78 48.88 47.35 48.93 ... 47.94 48.87 46.64
    elev     (site_id) float64 1.067e+03 1.106e+03 ... 1.228e+03 1.353e+03
wteq_da.attrs['description'] = 'SNOTEL snow water equivalent measurements (WTEQ_D)'
wteq_da.attrs['units'] = 'inches w.e.'
wteq_da
<xarray.DataArray 'WTEQ_D' (time: 16219, site_id: 74)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ...,
       [37.5, 24.6, 10.4, ..., 21. , 25.3, 17.9],
       [37.6, 24.4, 10.4, ..., 20.9, 25.2, 17.8],
       [37.8, 24.5, 10.4, ..., 21. , 25.4, 17.8]])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:908_WA_SNTL' ... 'SNOTEL:863_WA_SNTL'
    lon      (site_id) float64 -121.7 -121.3 -120.7 ... -123.4 -121.8 -121.4
    lat      (site_id) float64 47.78 48.88 47.35 48.93 ... 47.94 48.87 46.64
    elev     (site_id) float64 1.067e+03 1.106e+03 ... 1.228e+03 1.353e+03
Attributes:
    description:  SNOTEL snow water equivalent measurements (WTEQ_D)
    units:        inches w.e.

Merge the two DataArrays into a single DataSet#

ds = xr.merge([snwd_da, wteq_da])
ds
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 806)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Data variables:
    SNWD_D   (time, site_id) float64 nan nan nan nan nan ... nan nan nan nan nan
    WTEQ_D   (time, site_id) float64 nan nan nan nan nan ... nan nan 61.9 39.1
Attributes:
    description:  SNOTEL snow depth measurements (SNWD_D)
    units:        inches
#Clear the attributes for the DataSet (preserved in the DataArrays)
ds.attrs = {}
ds
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 806)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Data variables:
    SNWD_D   (time, site_id) float64 nan nan nan nan nan ... nan nan nan nan nan
    WTEQ_D   (time, site_id) float64 nan nan nan nan nan ... nan nan 61.9 39.1
#Access one variable (note dimensiond and coordinates)
ds['SNWD_D']
<xarray.DataArray 'SNWD_D' (time: 16219, site_id: 806)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ...,
       [ 90.,  25.,  56., ...,  43., 127., 101.],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Attributes:
    description:  SNOTEL snow depth measurements (SNWD_D)
    units:        inches
ds['WTEQ_D']
<xarray.DataArray 'WTEQ_D' (time: 16219, site_id: 806)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ...,
       [ nan,  nan,  nan, ...,  nan, 63.2, 39.1],
       [ nan,  nan,  nan, ...,  nan, 61.7, 39. ],
       [ nan,  nan,  nan, ...,  nan, 61.9, 39.1]])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Attributes:
    description:  SNOTEL snow water equivalent measurements (WTEQ_D)
    units:        inches w.e.

Write out as NetCDF file#

  • Better than random pickle file

out_fn = os.path.join(snotel_datadir, 'SNOTEL_CONUS_all.nc')
#ds.to_netcdf?
ds.to_netcdf(out_fn)
reopened = xr.open_dataset(out_fn)
reopened
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 806)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 ...
    lat      (site_id) float64 ...
    elev     (site_id) float64 ...
Data variables:
    SNWD_D   (time, site_id) float64 ...
    WTEQ_D   (time, site_id) float64 ...
reopened.close()

Selection on DataSet#

#Paradise
sitecode = 'SNOTEL:679_WA_SNTL'
ds.sel(site_id=sitecode)
<xarray.Dataset>
Dimensions:  (time: 16219)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
    site_id  <U18 'SNOTEL:679_WA_SNTL'
    lon      float64 -121.7
    lat      float64 46.78
    elev     float64 1.564e+03
Data variables:
    SNWD_D   (time) float64 nan nan nan nan nan nan ... 118.0 123.0 nan nan nan
    WTEQ_D   (time) float64 nan nan nan nan nan nan ... 47.8 48.8 49.1 49.8 49.9
#List of sites (Paradise and Easy Pass)
sitecode_list = ['SNOTEL:679_WA_SNTL', 'SNOTEL:998_WA_SNTL']
ds.sel(site_id=sitecode_list)
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 2)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:679_WA_SNTL' 'SNOTEL:998_WA_SNTL'
    lon      (site_id) float64 -121.7 -121.4
    lat      (site_id) float64 46.78 48.86
    elev     (site_id) float64 1.564e+03 1.606e+03
Data variables:
    SNWD_D   (time, site_id) float64 nan nan nan nan nan ... nan nan nan nan
    WTEQ_D   (time, site_id) float64 nan nan nan nan nan ... 49.8 61.7 49.9 61.9
ds.sel(time='2023-02-23')
<xarray.Dataset>
Dimensions:  (site_id: 806)
Coordinates:
    time     datetime64[ns] 2023-02-23
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Data variables:
    SNWD_D   (site_id) float64 90.0 25.0 56.0 61.0 54.0 ... 0.0 43.0 127.0 101.0
    WTEQ_D   (site_id) float64 nan nan nan nan nan ... 24.6 nan nan 63.2 39.1

Plotting#

#Doesn't work
#ds.sel(site_id=sitecode).plot()
ds.sel(site_id=sitecode)['WTEQ_D']
<xarray.DataArray 'WTEQ_D' (time: 16219)>
array([ nan,  nan,  nan, ..., 49.1, 49.8, 49.9])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
    site_id  <U18 'SNOTEL:679_WA_SNTL'
    lon      float64 -121.7
    lat      float64 46.78
    elev     float64 1.564e+03
Attributes:
    description:  SNOTEL snow water equivalent measurements (WTEQ_D)
    units:        inches w.e.
ds.sel(site_id=sitecode)['WTEQ_D'].plot();
../../_images/09_NDarrays_xarray_ERA5_demo_75_0.png
f, ax = plt.subplots()
ds.sel(site_id=sitecode)['WTEQ_D'].plot(ax=ax, label='WTEQ_D')
ds.sel(site_id=sitecode)['SNWD_D'].plot(ax=ax, label='SNWD_D')
ax.legend();
../../_images/09_NDarrays_xarray_ERA5_demo_76_0.png
ds.sel(site_id=sitecode_list)['SNWD_D']
<xarray.DataArray 'SNWD_D' (time: 16219, site_id: 2)>
array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan],
       ...,
       [ nan, 127.],
       [ nan,  nan],
       [ nan,  nan]])
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:679_WA_SNTL' 'SNOTEL:998_WA_SNTL'
    lon      (site_id) float64 -121.7 -121.4
    lat      (site_id) float64 46.78 48.86
    elev     (site_id) float64 1.564e+03 1.606e+03
Attributes:
    description:  SNOTEL snow depth measurements (SNWD_D)
    units:        inches
ds.sel(site_id=sitecode_list)['SNWD_D'].plot.line(x="time");
../../_images/09_NDarrays_xarray_ERA5_demo_78_0.png
#Remember we used hvplot for pandas
#import hvplot.pandas
import hvplot.xarray
#ds.sel(site_id=sitecode).hvplot()
ds.sel(site_id=sitecode).plot.scatter(x="SNWD_D", y="WTEQ_D", s=10, c=ds.isel(site_id=0)['time']);
../../_images/09_NDarrays_xarray_ERA5_demo_82_0.png
#ds.hvplot(kind='scatter', x='lon', y='lat', c='elev', aspect='equal')

Interactive plot with time slider#

#ds.hvplot(kind='scatter', x='lon', y='lat', c='SNWD_D', aspect='equal', clim=(0,150), cmap='inferno')

Add projected coordinates#

from pyproj import CRS, Transformer
crs_4326 = CRS.from_epsg(4326)
#Use our equal area projection from previous labs
crs_aea = CRS.from_proj4('+proj=aea +lat_1=37.00 +lat_2=47.00 +lat_0=42.00 +lon_0=-114.27')
transformer = Transformer.from_crs(crs_4326, crs_aea)
transformer
<Concatenated Operation Transformer: pipeline>
Description: axis order change (2D) + unknown
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
x, y = transformer.transform(ds['lat'], ds['lon'])
x[0:10], y[0:10]
(array([-641830.34227843,  783444.60403791, -219670.85812494,
         260151.57870442,  256411.14413521, -545201.83052402,
        -563757.73421726, -609610.91738397,  225520.62714738,
         702809.29230722]),
 array([ 126514.02974903, -447629.27923716, -179747.45522562,
         552612.93311311,  553958.49455869,   92343.21913656,
         784714.06105929,  492131.94845037,  -19505.12211235,
        -211932.9662484 ]))

Assign as new alternative coordinates (preserving original dimensions)#

ds
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 806)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
Data variables:
    SNWD_D   (time, site_id) float64 nan nan nan nan nan ... nan nan nan nan nan
    WTEQ_D   (time, site_id) float64 nan nan nan nan nan ... nan nan 61.9 39.1
ds = ds.assign_coords(x=('site_id', x), y=('site_id', y))
ds
<xarray.Dataset>
Dimensions:  (time: 16219, site_id: 806)
Coordinates:
  * time     (time) datetime64[ns] 1978-10-01 1978-10-02 ... 2023-02-25
  * site_id  (site_id) object 'SNOTEL:1000_OR_SNTL' ... 'SNOTEL:999_WA_SNTL'
    lon      (site_id) float64 -122.2 -105.4 -116.9 ... -110.8 -121.4 -121.7
    lat      (site_id) float64 42.87 37.61 40.36 46.91 ... 40.89 48.86 48.76
    elev     (site_id) float64 1.832e+03 3.246e+03 ... 1.606e+03 1.073e+03
    x        (site_id) float64 -6.418e+05 7.834e+05 ... -5.274e+05 -5.474e+05
    y        (site_id) float64 1.265e+05 -4.476e+05 ... 7.854e+05 7.764e+05
Data variables:
    SNWD_D   (time, site_id) float64 nan nan nan nan nan ... nan nan nan nan nan
    WTEQ_D   (time, site_id) float64 nan nan nan nan nan ... nan nan 61.9 39.1
#ds.hvplot(kind='scatter', x='x', y='y', c='SNWD_D', aspect='equal', cmap='inferno', s=5, clim=(0,150))