{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 08 Demo: Python Time\n",
    "UW Geospatial Data Analysis  \n",
    "CEE498/CEWA599  \n",
    "David Shean  \n",
    "\n",
    "## Introduction\n",
    "* https://csit.kutztown.edu/~schwesin/fall20/csc223/lectures/Pandas_Time_Series.html\n",
    "* Multiple options to represent datetime objects - easy to convert\n",
    "* https://en.wikipedia.org/wiki/Second\n",
    "\n",
    "### Python `datetime`\n",
    "* Built-in module called `datetime` which contains classes for `datetime` object (and `timedelta` object) - can be confusing\n",
    "* https://docs.python.org/3/library/datetime.html\n",
    "\n",
    "### NumPy `datetime64`\n",
    "* https://numpy.org/doc/stable/reference/arrays.datetime.html\n",
    "\n",
    "### Pandas `Timestamp`\n",
    "* https://pandas.pydata.org/docs/user_guide/timeseries.html\n",
    "* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html\n",
    "* https://pandas.pydata.org/docs/user_guide/timeseries.html#overview\n",
    "* `DatetimeIndex`\n",
    "* `pd.to_datetime()`\n",
    "    * Accepts \"int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like\"\n",
    "\n",
    "### xarray\n",
    "* https://xarray.pydata.org/en/stable/user-guide/time-series.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Day of calendar year\n",
    "* January 1 = 1\n",
    "* January 2 = 2\n",
    "* December 31 = 365 \n",
    "\n",
    "### Water year\n",
    "* Starts October 1, ends September\n",
    "* Southern hemisphere?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Time zones\n",
    "* Let Pandas handle this\n",
    "* You will inevitably get a warning about timezone aware vs. naive Timestamp objects\n",
    "    * Add time zone: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.tz_localize.html\n",
    "    * Remove time zone: https://stackoverflow.com/a/34687479\n",
    "* General advice (time and timestamps are messy): https://www.youtube.com/watch?v=-5wpm-gesOY&amp;ab_channel=Computerphile "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion\n",
    "* (t,x,y,z) records for one or more variables\n",
    "* Pandas Timestamp vs. Python DateTime vs. Numpy.DateTime64\n",
    "    * Some functions across different modules play nicely with one and not the other\n",
    "* Dealing with missing values in DataFrame\n",
    "    * Sometimes sensors fail or datalogger fails, sometimes values are flagged as erroneous\n",
    "    * Pandas has excellent support for missing values: https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html\n",
    "    * `dropna()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna\n",
    "* Trajectories\n",
    "    * Argo floats (https://argo.ucsd.edu/)\n",
    "    * Weather balloons\n",
    "    * GNSS tracks - vehicles, pedestrians, aircraft\n",
    "        * Spatial and temporal derivatives\n",
    "* Permanent stations\n",
    "    * Stream gage\n",
    "    * SNOTEL sites\n",
    "* How big is too big for Pandas/GeoPandas?\n",
    "    * https://github.com/toddwschneider/nyc-taxi-data\n",
    "    * PostgreSQL/PostGIS\n",
    "        * SQL - Structured Query Language, used for managing data in a relational database\n",
    "* What to do with multiple variables for each timestamp?\n",
    "    * xarray works well for multiple variables (e.g., snow depth and SWE for same site) for each station for each time\n",
    "        * https://docs.xarray.dev/en/stable/\n",
    "    * Separate 2D dataframes\n",
    "        * One storing locations of all sites\n",
    "        * One storing time series of some variable for all sites\n",
    "        * Common station ID as key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mInit signature:\u001b[0m \u001b[0mdatetime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;31mDocstring:\u001b[0m     \n",
       "datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])\n",
       "\n",
       "The year, month and day arguments are required. tzinfo may be None, or an\n",
       "instance of a tzinfo subclass. The remaining arguments may be ints.\n",
       "\u001b[0;31mFile:\u001b[0m           /srv/conda/envs/notebook/lib/python3.9/datetime.py\n",
       "\u001b[0;31mType:\u001b[0m           type\n",
       "\u001b[0;31mSubclasses:\u001b[0m     ABCTimestamp, _NaT\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "datetime?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt1 = datetime(2022, 2, 22)\n",
    "dt2 = datetime.now()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2022-02-26 01:55:25.626477\n"
     ]
    }
   ],
   "source": [
    "print(dt2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime.datetime(2022, 2, 26, 1, 55, 25, 626477)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt2.year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mDocstring:\u001b[0m format -> strftime() style string.\n",
       "\u001b[0;31mType:\u001b[0m      builtin_function_or_method\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "dt2.strftime?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Side note: formatting timestamp strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'20220226'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#YYYYMMDD is best\n",
    "dt2.strftime('%Y%m%d')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'02262022'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#This won't sort alphanumerically\n",
    "dt2.strftime('%m%d%Y')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt_diff = dt2 - dt1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime.timedelta(days=4, seconds=6925, microseconds=626477)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt_diff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "352525.626477"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt_diff.total_seconds()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### How many seconds in a day?  In a year?\n",
    "* approximately `pi * 10^7`\n",
    "* What is a second anyway?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "86400"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "60*60*24"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime.datetime(2022, 2, 26, 1, 55, 25, 626477)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2022-02-26 01:55:25.626477')"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.to_datetime(dt2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts1 = pd.Timestamp('2019-02-01 12:00:00')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts2 = pd.Timestamp('2019-02-06 00:00:00')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-02-01 12:00:00')"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-02-06 00:00:00')"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt = ts2 - ts1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timedelta('4 days 12:00:00')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-02-01 12:00:00')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-02-06 00:00:00')"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts1 + dt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-02-10 12:00:00')"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts2 + dt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2019-01-31 12:00:00')"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts1 - pd.Timedelta(days=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "388800.0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt.total_seconds()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
