# 08 Demo: Python Time
UW Geospatial Data Analysis  
CEE467/CEWA567  
David Shean  

## Introduction
* https://csit.kutztown.edu/~schwesin/fall20/csc223/lectures/Pandas_Time_Series.html
* Multiple options to represent datetime objects - easy to convert
* https://en.wikipedia.org/wiki/Second

### Python `datetime`
* Built-in module called `datetime` which contains classes for `datetime` object (and `timedelta` object) - can be confusing
* https://docs.python.org/3/library/datetime.html

### NumPy `datetime64`
* https://numpy.org/doc/stable/reference/arrays.datetime.html

### Pandas `Timestamp`
* https://pandas.pydata.org/docs/user_guide/timeseries.html
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
* https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
* `DatetimeIndex`
* `pd.to_datetime()`
    * Accepts "int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like"

### xarray
* https://xarray.pydata.org/en/stable/user-guide/time-series.html

### Day of calendar year
* January 1 = 1
* January 2 = 2
* December 31 = 365 

### Water year
* Starts October 1, ends September
* Southern hemisphere?

### Time zones
* Let Pandas handle this
* You will inevitably get a warning about timezone aware vs. naive Timestamp objects
    * Add time zone: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.tz_localize.html
    * Remove time zone: https://stackoverflow.com/a/34687479
* General advice (time and timestamps are messy): https://www.youtube.com/watch?v=-5wpm-gesOY&amp;ab_channel=Computerphile 

## Discussion
* (t,x,y,z) records for one or more variables
* Pandas Timestamp vs. Python DateTime vs. Numpy.DateTime64
    * Some functions across different modules play nicely with one and not the other
* Dealing with missing values in DataFrame
    * Sometimes sensors fail or datalogger fails, sometimes values are flagged as erroneous
    * Pandas has excellent support for missing values: https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
    * `dropna()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna
* Trajectories
    * Argo floats (https://argo.ucsd.edu/)
    * Weather balloons
    * GNSS tracks - vehicles, pedestrians, aircraft
        * Spatial and temporal derivatives
* Permanent stations
    * Stream gage
    * SNOTEL sites
* How big is too big for Pandas/GeoPandas?
    * https://github.com/toddwschneider/nyc-taxi-data
    * PostgreSQL/PostGIS
        * SQL - Structured Query Language, used for managing data in a relational database
* What to do with multiple variables for each timestamp?
    * xarray works well for multiple variables (e.g., snow depth and SWE for same site) for each station for each time
        * https://docs.xarray.dev/en/stable/
    * Separate 2D dataframes
        * One storing locations of all sites
        * One storing time series of some variable for all sites
        * Common station ID as key

In [None]:
from datetime import datetime
import pandas as pd
import numpy as np

In [None]:
datetime?

[0;31mInit signature:[0m [0mdatetime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           /srv/conda/envs/notebook/lib/python3.10/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     ABCTimestamp, _NaT

In [None]:
dt1 = datetime(2023, 2, 22)
dt2 = datetime.now()

In [None]:
print(dt2)

2023-02-25 21:28:50.308138


In [None]:
dt2

datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)

In [None]:
dt2.year

2023

In [None]:
dt2.strftime?

[0;31mDocstring:[0m format -> strftime() style string.
[0;31mType:[0m      builtin_function_or_method

#### Side note: formatting timestamp strings

In [None]:
#Typical U.S. date format
dt2.strftime('%m/%d/%y')

'02/25/23'

In [None]:
#This won't sort alphanumerically
dt2.strftime('%m%d%Y')

'02252023'

In [None]:
#YYYYMMDD is better and will sort alphanumerically
dt2.strftime('%Y%m%d')

'20230225'

In [None]:
dt1

datetime.datetime(2023, 2, 22, 0, 0)

In [None]:
dt2

datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)

In [None]:
dt_diff = dt2 - dt1

In [None]:
dt_diff

datetime.timedelta(days=3, seconds=77330, microseconds=308138)

In [None]:
dt_diff.total_seconds()

336530.308138

#### How many seconds in a day?  In a year?
* approximately `pi * 10^7`
* What is a second anyway?

In [None]:
dt_diff.total_seconds()/(60*60*24*365.25)

0.010664001956359167

In [None]:
60*60*24*365.25

31557600.0

In [None]:
dt2

datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)

In [None]:
pd.to_datetime(dt2)

Timestamp('2023-02-25 21:28:50.308138')

In [None]:
ts1 = pd.Timestamp('2019-02-01 12:00:00')

In [None]:
ts2 = pd.Timestamp('2019-02-06 00:00:00')

In [None]:
ts1

Timestamp('2019-02-01 12:00:00')

In [None]:
ts2

Timestamp('2019-02-06 00:00:00')

In [None]:
dt = ts2 - ts1

In [None]:
dt

Timedelta('4 days 12:00:00')

In [None]:
ts1

Timestamp('2019-02-01 12:00:00')

In [None]:
ts1 + dt

Timestamp('2019-02-06 00:00:00')

In [None]:
ts2 + dt

Timestamp('2019-02-10 12:00:00')

In [None]:
ts1 - pd.Timedelta(days=1)

Timestamp('2019-01-31 12:00:00')

In [None]:
dt.total_seconds()

388800.0