08 Demo: Python Time
Contents
08 Demo: Python Time#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
Introduction#
https://csit.kutztown.edu/~schwesin/fall20/csc223/lectures/Pandas_Time_Series.html
Multiple options to represent datetime objects - easy to convert
Python datetime
#
Built-in module called
datetime
which contains classes fordatetime
object (andtimedelta
object) - can be confusing
NumPy datetime64
#
Pandas Timestamp
#
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
DatetimeIndex
pd.to_datetime()
Accepts “int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like”
xarray#
Day of calendar year#
January 1 = 1
January 2 = 2
December 31 = 365
Water year#
Starts October 1, ends September
Southern hemisphere?
Time zones#
Let Pandas handle this
You will inevitably get a warning about timezone aware vs. naive Timestamp objects
Add time zone: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.tz_localize.html
Remove time zone: https://stackoverflow.com/a/34687479
General advice (time and timestamps are messy): https://www.youtube.com/watch?v=-5wpm-gesOY&ab_channel=Computerphile
Discussion#
(t,x,y,z) records for one or more variables
Pandas Timestamp vs. Python DateTime vs. Numpy.DateTime64
Some functions across different modules play nicely with one and not the other
Dealing with missing values in DataFrame
Sometimes sensors fail or datalogger fails, sometimes values are flagged as erroneous
Pandas has excellent support for missing values: https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
Trajectories
Argo floats (https://argo.ucsd.edu/)
Weather balloons
GNSS tracks - vehicles, pedestrians, aircraft
Spatial and temporal derivatives
Permanent stations
Stream gage
SNOTEL sites
How big is too big for Pandas/GeoPandas?
PostgreSQL/PostGIS
SQL - Structured Query Language, used for managing data in a relational database
What to do with multiple variables for each timestamp?
xarray works well for multiple variables (e.g., snow depth and SWE for same site) for each station for each time
Separate 2D dataframes
One storing locations of all sites
One storing time series of some variable for all sites
Common station ID as key
from datetime import datetime
import pandas as pd
import numpy as np
datetime?
Init signature: datetime(self, /, *args, **kwargs)
Docstring:
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
File: /srv/conda/envs/notebook/lib/python3.10/datetime.py
Type: type
Subclasses: ABCTimestamp, _NaT
dt1 = datetime(2023, 2, 22)
dt2 = datetime.now()
print(dt2)
2023-02-25 21:28:50.308138
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
dt2.year
2023
dt2.strftime?
Docstring: format -> strftime() style string.
Type: builtin_function_or_method
Side note: formatting timestamp strings#
#Typical U.S. date format
dt2.strftime('%m/%d/%y')
'02/25/23'
#This won't sort alphanumerically
dt2.strftime('%m%d%Y')
'02252023'
#YYYYMMDD is better and will sort alphanumerically
dt2.strftime('%Y%m%d')
'20230225'
dt1
datetime.datetime(2023, 2, 22, 0, 0)
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
dt_diff = dt2 - dt1
dt_diff
datetime.timedelta(days=3, seconds=77330, microseconds=308138)
dt_diff.total_seconds()
336530.308138
How many seconds in a day? In a year?#
approximately
pi * 10^7
What is a second anyway?
dt_diff.total_seconds()/(60*60*24*365.25)
0.010664001956359167
60*60*24*365.25
31557600.0
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
pd.to_datetime(dt2)
Timestamp('2023-02-25 21:28:50.308138')
ts1 = pd.Timestamp('2019-02-01 12:00:00')
ts2 = pd.Timestamp('2019-02-06 00:00:00')
ts1
Timestamp('2019-02-01 12:00:00')
ts2
Timestamp('2019-02-06 00:00:00')
dt = ts2 - ts1
dt
Timedelta('4 days 12:00:00')
ts1
Timestamp('2019-02-01 12:00:00')
ts1 + dt
Timestamp('2019-02-06 00:00:00')
ts2 + dt
Timestamp('2019-02-10 12:00:00')
ts1 - pd.Timedelta(days=1)
Timestamp('2019-01-31 12:00:00')
dt.total_seconds()
388800.0