{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d0ce6893-8fdb-414f-9b88-69e06edaf0e7",
   "metadata": {},
   "source": [
    "# 08 Demo: SNOTEL Query and Download\n",
    "\n",
    "UW Geospatial Data Analysis  \n",
    "CEE498/CEWA599  \n",
    "David Shean  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d562c771-0ada-4d1d-8f84-b4b0edd4dca3",
   "metadata": {},
   "source": [
    "## SNOTEL Introduction\n",
    "Read a bit about SNOTEL data for the Western U.S.:\n",
    "\n",
    "https://www.wcc.nrcs.usda.gov/snow/\n",
    "\n",
    "This is actually a nice web interface, with some advanced querying and interactive visualization.  You can also download formatted ASCII files (csv) for analysis.  This is great for one-time projects, but it's nice to have reproducible code that can be updated as new data appear, without manual steps.  That's what we're going to do here.\n",
    "\n",
    "### About SNOTEL sites and data:\n",
    "* https://www.nrcs.usda.gov/wps/portal/wcc/home/aboutUs/monitoringPrograms/automatedSnowMonitoring\n",
    "* https://www.wcc.nrcs.usda.gov/snotel/snotel_sensors.html\n",
    "* https://directives.sc.egov.usda.gov/OpenNonWebContent.aspx?content=27630.wba\n",
    "\n",
    "### Sample plots for SNOTEL site at Paradise, WA (south side of Mt. Rainier)\n",
    "* https://www.nwrfc.noaa.gov/snow/snowplot.cgi?AFSW1\n",
    "* We will reproduce some of these plots/metrics during this lab\n",
    "\n",
    "### Interactive dashboard\n",
    "* https://climate.washington.edu/climate-data/snowdepth/\n",
    "\n",
    "### Snow today\n",
    "* https://nsidc.org/reports/snow-today"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef18fd57-a06c-4ebd-9c11-c566f52905e7",
   "metadata": {},
   "source": [
    "## CUAHSI WOF server and automated Python data queries\n",
    "We are going to use a server set up by CUAHSI to serve the SNOTEL data, using a standardized database storage format and query structure.  You don't need to worry about this, but please quickly review the following:\n",
    "* http://hiscentral.cuahsi.org/pub_network.aspx?n=241 \n",
    "* http://his.cuahsi.org/wofws.html\n",
    "\n",
    "### Acronym soup\n",
    "* SNOTEL = Snow Telemetry\n",
    "* CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science, Inc\n",
    "* WOF = WaterOneFlow\n",
    "* WSDL = Web Services Description Language\n",
    "* USDA = United States Department of Agriculture\n",
    "* NRCS = National Resources Conservation Service\n",
    "* AWDB = Air-Water Database"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06f29327-824d-47a1-b661-d376c82f93e3",
   "metadata": {},
   "source": [
    "### Python options\n",
    "There are a few packages out there that offer convenience functions to query the online SNOTEL databases and unpack the results.  \n",
    "* climata (https://pypi.org/project/climata/) - last commit Sept 2017 (not a good sign)\n",
    "* ulmo (https://github.com/ulmo-dev/ulmo) - last commit Sept 2021 (will be superseded by a package called Quest, but still maintained by [Emilio Mayorga](https://apl.uw.edu/people/profile.php?last_name=Mayorga&first_name=Emilio) over at UW APL)\n",
    "\n",
    "You can also write your own queries using the Python `requests` module and some built-in XML parsing libraries.\n",
    "\n",
    "Hopefully not overwhelming amount of information - let's just go with ulmo for now.  I've done most of the work to prepare functions for querying and processing the data.  Once you wrap your head around all of the acronyms, it's pretty simple, basically running a few functions here: https://ulmo.readthedocs.io/en/latest/api.html#module-ulmo.cuahsi.wof\n",
    "\n",
    "We will use ulmo with daily data for this exercise, but please feel free to experiment with hourly data, other variables or other approaches to fetch SNOTEL data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81e1950e-fe66-41ad-aec3-dee9ea63b206",
   "metadata": {},
   "source": [
    "### ulmo installation\n",
    "* Note that `ulmo` is not part of the default Hub environment\n",
    "* We will cover more on conda and environment management in coming weeks, but know that you can temporarily install packages from terminal or even in Jupyterhub notebook\n",
    "    * When running conda install from notebook, make sure to use the `-y` flag to avoid interactive prompt\n",
    "* Note that packages installed this way won't persist when your server is shut down, so you will need to reinstall to use again\n",
    "    * Fortunately, once we've used ulmo for data access and download, we won't need it again"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9483a2a3-eceb-4002-b211-e81658f63818",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Doesn't work until ulmo is installed\n",
    "import ulmo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49dd1100-9007-4064-8d4c-40bd2fc4175c",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Install directly from github repo main branch\n",
    "#%pip install git+https://github.com/ulmo-dev/ulmo.git\n",
    "#!python -m pip install git+https://github.com/ulmo-dev/ulmo.git"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b845fa-41b5-47c5-a7fd-7fecb8d45b27",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!conda install -q -u ulmo\n",
    "!mamba install -q -y ulmo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d56a8bba-d78e-47f0-b892-7b5ee65dfb6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ulmo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0df55d28-27d4-4e63-bc27-73c2102853e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import geopandas as gpd\n",
    "from shapely.geometry import Point"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11843e6c-1977-4b7d-ac2d-835deacb6716",
   "metadata": {},
   "source": [
    "## Load state polygons for later use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6862365-93b7-4391-9b72-7a2333c64e7f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json'\n",
    "#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json'\n",
    "states_gdf = gpd.read_file(states_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09146392-d324-430f-8b80-55a929c19821",
   "metadata": {},
   "source": [
    "## CUAHSI WOF server information\n",
    "* Try typing this in a browser, note what you get back (xml)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2fcd818-599d-4f1d-b905-bb0aafdd9349",
   "metadata": {},
   "outputs": [],
   "source": [
    "#http://his.cuahsi.org/wofws.html\n",
    "wsdlurl = 'https://hydroportal.cuahsi.org/Snotel/cuahsi_1_1.asmx?WSDL'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f7edee8-163e-4e4f-a0ac-4c14d746ff3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "ulmo.cuahsi.wof.get_sites?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff5fdd07-3eb5-40e8-8626-a55f992cd753",
   "metadata": {},
   "source": [
    "## Run the query for available sites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1efbf490-d0f5-4321-a6a4-b4f50e1696de",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites = ulmo.cuahsi.wof.get_sites(wsdlurl)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8286799f-eb3a-4eeb-9c74-563c6241ee0d",
   "metadata": {},
   "source": [
    "If you get an error (`AttributeError: module 'ulmo' has no attribute 'cuahsi'`), you need to restart your kernel and \"Run All Above Selected Cell\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "084831b1-c212-4794-bc9b-6b12e57cac52",
   "metadata": {},
   "source": [
    "### Take a moment to inspect the output\n",
    "\n",
    "* What is the `type`?\n",
    "* Note data structure and different key/value pairs for each site "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90afcb97-7c74-48bd-a6f5-c91bf38309f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "type(sites)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b4256da-1ccc-45db-868a-f14847c70e59",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Preview first item in dictionary\n",
    "next(iter(sites.items()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90202efb-3dcd-4150-99cb-576011fb50ae",
   "metadata": {},
   "source": [
    "## Store as a Pandas DataFrame called `sites_df`\n",
    "* See the `from_dict` function\n",
    "* Use `orient` option so the sites comprise the DataFrame index, with columns for 'name', 'elevation_m', etc\n",
    "* Use the `dropna` method to remove any empty records"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a717652-068d-46ad-81cc-0f4e23e54e5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df = pd.DataFrame.from_dict(sites, orient='index').dropna()\n",
    "sites_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cb13232-0ce8-4efe-b6f7-6c1037a4bb59",
   "metadata": {},
   "source": [
    "### Clean up the DataFrame and prepare Point geometry objects\n",
    "* We covered this with the GLAS data conversion to GeoPandas\n",
    "* Convert `'location'` column (contains dictionary with `'latitude'` and `'longitude'` values) to Shapely `Point` objects\n",
    "* Store as a new `'geometry'` column\n",
    "* Drop the `'location'` column, as this is no longer needed\n",
    "* Update the `dtype` of the `'elevation_m'` column to float"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab011409-cfc9-40a9-93f8-25a81db224cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df['location']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "633fc897-afa0-4045-a953-ad9e661dbe0f",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df['geometry'] = [Point(float(l['longitude']), float(l['latitude'])) for l in sites_df['location']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44a9f8c7-f64d-4c1e-b624-6a0bd50ece32",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df = sites_df.drop(columns='location')\n",
    "sites_df = sites_df.astype({\"elevation_m\":float})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c89b6e8-ff7e-4154-bf76-fdcfa75e67ba",
   "metadata": {},
   "source": [
    "### Review output\n",
    "* Take a moment to familiarize yourself with the DataFrame structure and different columns.\n",
    "* Note that the index is a set of strings with format 'SNOTEL:1000_OR_SNTL'\n",
    "* Extract the first record with `loc`\n",
    "    * Review the `'site_property'` dictionary - could parse this and store as separate fields in the DataFrame if desired"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6e9c8a6-37da-4b99-b3cb-d5701ada65ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e812d6d-f766-4729-ad73-121f864ea884",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df.loc['SNOTEL:301_CA_SNTL']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec263354-6dd2-49d9-ad50-a593c2bad1cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_df.loc['SNOTEL:301_CA_SNTL']['site_property']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d2f5196-a520-40aa-85f6-36721c8967a7",
   "metadata": {},
   "source": [
    "### Convert to a Geopandas GeoDataFrame\n",
    "* We already have `'geometry'` column, but still need to define the `crs` \n",
    "* Note the number of records"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "325e51d6-4758-4c91-8e79-afb0f0883cca",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_gdf_all = gpd.GeoDataFrame(sites_df, crs='EPSG:4326')\n",
    "sites_gdf_all.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06c30b1d-59f8-4808-baaa-f068f9352f9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_gdf_all.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbd2d900-b435-4c7e-b4b9-9972ccddf1c9",
   "metadata": {},
   "source": [
    "### Create a scatterplot showing elevation values for all sites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b3f95c3-3a24-4da5-a2b8-8bea8729f567",
   "metadata": {},
   "outputs": [],
   "source": [
    "f, ax = plt.subplots(figsize=(10,6))\n",
    "sites_gdf_all.plot(ax=ax, column='elevation_m', markersize=3, cmap='inferno', legend=True)\n",
    "ax.autoscale(False)\n",
    "states_gdf.plot(ax=ax, facecolor='none', edgecolor='k', alpha=0.3);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9230936-6f9b-484f-b7b1-02d3f1eba579",
   "metadata": {},
   "source": [
    "### Exclude the Alaska (AK) points to isolate points over Western U.S.\n",
    "* Can remove points where the site name contains 'AK' or can use a spatial filter (see GeoPandas cx indexer functionality)\n",
    "* Note the number of records"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1f743c1-b28c-4cb8-ac8e-38689bac4aac",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_gdf_all = sites_gdf_all[~(sites_gdf_all.index.str.contains('AK'))]\n",
    "#xmin, xmax, ymin, ymax = [-126, 102, 30, 50]\n",
    "#sites_gdf_all = sites_gdf_all.cx[xmin:xmax, ymin:ymax]\n",
    "sites_gdf_all.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe0bb51b-eb34-46d5-ba84-4c233686ae02",
   "metadata": {},
   "source": [
    "### Update scatterplot as sanity check\n",
    "* Should look something like the Western U.S."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b026ec0-7c8d-4eee-a8c9-d1239b0fae09",
   "metadata": {},
   "outputs": [],
   "source": [
    "f, ax = plt.subplots(figsize=(10,6))\n",
    "sites_gdf_all.plot(ax=ax, column='elevation_m', markersize=3, cmap='inferno', legend=True)\n",
    "ax.autoscale(False)\n",
    "states_gdf.plot(ax=ax, facecolor='none', edgecolor='k', alpha=0.3);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e053433a-9e09-4fc5-b928-da19e516c988",
   "metadata": {},
   "source": [
    "### Export SNOTEL site GeoDataFrame as a geojson\n",
    "* Maybe useful for later lab or other analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd9fc09f-8980-42e3-a41d-d708c1a40e19",
   "metadata": {},
   "outputs": [],
   "source": [
    "sites_fn = 'snotel_conus_sites.json'\n",
    "if not os.path.exists(sites_fn):\n",
    "    sites_gdf_all.to_file(sites_fn, driver='GeoJSON')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2d67d77-df5a-406c-b96d-16519958adb6",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Part 2: Time series analysis for one station\n",
    "Now that we've identified sites of interest, let's query the API to obtain the time series data for variables of interest (snow!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27c26424-af2a-447e-8289-90d5abef7f95",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Interactive basemap - useful to select a site of interest\n",
    "#sites_gdf_all.explore(column='elevation_m')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdb801e5-e49a-45f8-8766-a6856cee467e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Hart's Pass\n",
    "#sitecode = 'SNOTEL:515_WA_SNTL'\n",
    "#Paradise\n",
    "sitecode = 'SNOTEL:679_WA_SNTL'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1552567d-d777-453a-95a0-ca9a6a29aeff",
   "metadata": {},
   "source": [
    "### Query the server for information about this site\n",
    "* Use the ulmo cuahsi `get_site_info` method here\n",
    "* Lots of output here, try to skim and get a sense of the different parameters and associated metadata\n",
    "* Note that there are many standard meteorological variables that can be downloaded for this site, in addition to the snow depth and snow water equivalent.\n",
    "    * Careful about using some variables - documented bias in measurements\n",
    "    * https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2014GL062803"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da148d50-6442-4c10-801e-1842508645fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "site_info = ulmo.cuahsi.wof.get_site_info(wsdlurl, sitecode)\n",
    "#site_info"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d74918aa-5307-42e2-af91-edbfb34e07f9",
   "metadata": {},
   "source": [
    "## Inspect the returned information\n",
    "* Note number of available variables in the time series data!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8dd37032-a0dd-4be7-b214-f8176b8cf44e",
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_keys = site_info['series'].keys()\n",
    "dict_keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf580b6b-6b94-4295-b270-ac3ca02d23cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "len(dict_keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "400a019a-b4df-4b35-aa7e-c25bce15756e",
   "metadata": {},
   "source": [
    "Note on units:\n",
    "* _H = \"hourly\"\n",
    "* _D = \"daily\"\n",
    "* _sm, _m = \"monthly\"\n",
    "* _y = \"yearly\"\n",
    "* _wy = \"water year\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44c0532d-c87c-4392-93fd-6d87b01f5b53",
   "metadata": {},
   "source": [
    "## Let's consider the 'SNOTEL:SNWD_D' variable (Daily Snow Depth)\n",
    "* Assign 'SNOTEL:SNWD_D' to a variable named `variablecode`\n",
    "* Get some information about the variable using `get_variable_info` method\n",
    "* Note the units, nodata value, etc.\n",
    "* **Note: The snow depth records are almost always shorter/noisier than the SWE records for SNOTEL sites**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2164d0f6-2d66-479e-b2bb-bfa09d7d3df4",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Daily SWE\n",
    "#variablecode = 'SNOTEL:WTEQ_D'\n",
    "#Daily snow depth\n",
    "variablecode = 'SNOTEL:SNWD_D'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11763645-cba7-418e-9567-6a22f276e38e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Hourly SWE\n",
    "#variablecode = 'SNOTEL:WTEQ_H'\n",
    "#Hourly snow depth\n",
    "#variablecode = 'SNOTEL:SNWD_H'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55b65d7a-5ea5-437e-b573-eb8a7b59fe49",
   "metadata": {},
   "outputs": [],
   "source": [
    "ulmo.cuahsi.wof.get_variable_info(wsdlurl, variablecode)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e4426aa-de05-4c7f-a3f6-f2c1fda227df",
   "metadata": {},
   "source": [
    "## Define a function to fetch time series data\n",
    "* Can define specific start and end dates, or get full record (default)\n",
    "* I've done this for you, but please review the comments and steps to see what is going on under the hood\n",
    "* You'll probably have to do similar data wrangling for another project at some point in the future"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e66d1a6-31ec-4e7d-afdd-d8e42621da6e",
   "metadata": {},
   "outputs": [],
   "source": [
    "today = pd.to_datetime(\"today\")\n",
    "today.strftime('%Y-%m-%d')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7aaf4032-cee5-4824-a16d-0acfbcb4da27",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Get current datetime\n",
    "today = pd.to_datetime(\"today\").strftime('%Y-%m-%d')\n",
    "\n",
    "def snotel_fetch(sitecode, variablecode='SNOTEL:SNWD_D', start_date='1950-10-01', end_date=today):\n",
    "    #print(sitecode, variablecode, start_date, end_date)\n",
    "    values_df = None\n",
    "    try:\n",
    "        #Request data from the server\n",
    "        site_values = ulmo.cuahsi.wof.get_values(wsdlurl, sitecode, variablecode, start=start_date, end=end_date)\n",
    "    except:\n",
    "        print(f\"Unable to fetch {variablecode} for {sitecode}\")\n",
    "    else:\n",
    "        #Convert to a Pandas DataFrame   \n",
    "        values_df = pd.DataFrame.from_dict(site_values['values'])\n",
    "        #Parse the datetime values to Pandas Timestamp objects\n",
    "        values_df['datetime'] = pd.to_datetime(values_df['datetime'], utc=True)\n",
    "        #Set the DataFrame index to the Timestamps\n",
    "        values_df = values_df.set_index('datetime')\n",
    "        #Convert values to float and replace -9999 nodata values with NaN\n",
    "        values_df['value'] = pd.to_numeric(values_df['value']).replace(-9999, np.nan)\n",
    "        #Remove any records flagged with lower quality\n",
    "        values_df = values_df[values_df['quality_control_level_code'] == '1']\n",
    "\n",
    "    return values_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93f8be68-98ef-4780-a93e-859bf63e86e3",
   "metadata": {},
   "source": [
    "### Use this function to get the full 'SNOTEL:SNWD_D' record for one station\n",
    "* Inspect the results\n",
    "* We used a dummy start date of Jan 1, 1950.  What is the actual the first date returned?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68b6574a-9e5c-4dd2-bf51-133825ff14f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "print(sitecode)\n",
    "#values_df = snotel_fetch(sitecode, variablecode, start_date, end_date)\n",
    "values_df = snotel_fetch(sitecode, variablecode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50a3fe02-b750-4d42-a933-6956d44c0523",
   "metadata": {},
   "outputs": [],
   "source": [
    "values_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12362c47-0023-4c03-a17e-98903eda9f90",
   "metadata": {},
   "outputs": [],
   "source": [
    "values_df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb007673-e392-45a4-ab4e-5558d5c8e681",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Create a quick plot to view the time series\n",
    "* Take a moment to inspect the `value` column, which is where the `SNWD_D` values are stored\n",
    "* Sanity check thought question: *What are the units again?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79fdd6a1-0fa8-4464-aa93-c2eb3be95002",
   "metadata": {},
   "outputs": [],
   "source": [
    "values_df.plot();"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9028fce-c8f3-4674-999e-87f1021c0fd5",
   "metadata": {},
   "source": [
    "### Write out the dataframe to disk\n",
    "* Can use a number of different formats here, easiest to use a \"pickle\": https://www.pythoncentral.io/how-to-pickle-unpickle-tutorial/\n",
    "* Define a unique filename for the dataset (e.g., `pkl_fn='snotel_wa_snwd_d.pkl'`)\n",
    "* Write the DataFrame to disk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a784e5e-ca21-4c14-a0d6-c3c1a99779f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "pkl_fn = f\"{variablecode.replace(':','-')}_{sitecode.split(':')[-1]}.pkl\"\n",
    "pkl_fn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "046df7ae-79b9-4cf7-8229-f619146c1deb",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Writing out: {pkl_fn}\")\n",
    "values_df.to_pickle(pkl_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12abab7f-fcd7-45ce-815e-11d897eff157",
   "metadata": {},
   "source": [
    "### Read it back in to check"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86a41d19-f1e9-4a1d-8ff4-d053846ee977",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.read_pickle(pkl_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77fb8e27-9a7c-4813-82eb-7fb3f24b9e79",
   "metadata": {},
   "source": [
    "## Part 3: Retrieve time series records for **All** SNOTEL Sites\n",
    "* Now that we've explored one site, let's look at them all!\n",
    "* Probably going to be some interesting spatial/temporal variability in these metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3da389c-ac1a-4f1b-8078-ed4d55deb798",
   "metadata": {},
   "source": [
    "#### Notes:\n",
    "* I've providing the following code to do this for you. Please review so you understand what's going on:\n",
    "    * Loop through all sites names in the sites GeoDataFrame and run the above `snotel_fetch` function\n",
    "    * Store output in a dictionary\n",
    "    * Convert the dictionary to a Pandas Dataframe\n",
    "    * Write final output to disk\n",
    "* Note that this could take some time to run for all SNOTEL sites (~10-40 min, depending on server load)\n",
    "    * Progress will be printed out incrementally\n",
    "    * While you wait, explore some of the above links, or review remainder of lab\n",
    "* Several sites may return an error (e.g., `<suds.sax.document.Document object at 0x7f0813040730>`)\n",
    "    * Fortunately, this is handled by the `try-except` block in the `snotel_fetch` function above\n",
    "    * This warning could arise for a number of reasons:\n",
    "        * The site doesn't have an ultrasonic snow depth sensor: https://doi.org/10.1175/2007JTECHA947.1\n",
    "        * The site was decomissioned or never produced valid data\n",
    "        * The data are not available on the CUAHSI server\n",
    "        * There was an issue connecting with the CUAHSI server\n",
    "    * Most likely the first issue.  Some of these sites should have valid records for SWE, but not SD"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dda93160-27e0-4880-b8a7-3b354f2756fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "#All SNOTEL sites\n",
    "pkl_fn = f\"{variablecode.replace(':','-')}_CONUS_all.pkl\"\n",
    "print(pkl_fn)\n",
    "#Can include start and end dates in filename, but will trigger re-download every day\n",
    "#pkl_fn = variablecode.replace(':','_')+'_'+start_date+'_'+end_date+'.pkl'\n",
    "\n",
    "#Define variable for the site geodataframe\n",
    "gdf = sites_gdf_all\n",
    "\n",
    "#Isolated to WA SNOTEL sites\n",
    "#pkl_fn = 'snotel_snwd_d_wa.pkl'\n",
    "#gdf = sites_gdf_wa"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bedb0fcf-9f96-4b6e-93b3-fd82ffe11dd5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "if os.path.exists(pkl_fn):\n",
    "    snwd_df = pd.read_pickle(pkl_fn)\n",
    "else:\n",
    "    #Define an empty dictionary to store returns for each site\n",
    "    value_dict = {}\n",
    "    #Loop through each site and add record to dictionary\n",
    "    for i, sitecode in enumerate(gdf.index):\n",
    "        print('%i of %i sites: %s' % (i+1, len(gdf.index), sitecode))\n",
    "        #out = snotel_fetch(sitecode, variablecode, start_date, end_date)\n",
    "        out = snotel_fetch(sitecode, variablecode)\n",
    "        if out is not None:\n",
    "            value_dict[sitecode] = out['value']\n",
    "    \n",
    "    #Convert the dictionary to a DataFrame, automatically handles different datetime ranges (nice!)\n",
    "    snwd_df = pd.DataFrame.from_dict(value_dict)\n",
    "    #Write out\n",
    "    print(f\"Writing out: {pkl_fn}\")\n",
    "    snwd_df.to_pickle(pkl_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0464adf8-7a7e-44a1-a8d8-3fcac97fe440",
   "metadata": {},
   "source": [
    "## Inspect the DataFrame\n",
    "* Note structure, number of timestamps, NaNs\n",
    "* What is the date of the first snow depth measurement in the network?\n",
    "    * Note that the water equivalent (WTEQ_D) measurements from snow pillows extend much farther back in time, back to the early 1980s, before the ultrasonic snow depth instruments were incorporated across the network.  These are better to use for long-term studies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a22d2117-2a65-42ff-88a1-606d2b5bd388",
   "metadata": {},
   "outputs": [],
   "source": [
    "snwd_df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5507e69b-276e-4aa6-8ff1-84928f0dc1a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "snwd_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2386ba74-e880-4816-84db-e62c9da67fb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "snwd_df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98584234-8f8b-4651-803e-0b98ada07e23",
   "metadata": {},
   "outputs": [],
   "source": [
    "snwd_df.describe()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}