Demo: NumPy, Pandas, Matplotlib¶
UW Geospatial Data Analysis
CEE498/CEWA599
David Shean
NumPy¶
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
Matplotlib¶
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code. For examples, see the sample plots and thumbnail gallery.
For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined with IPython. For the power user, you have full control of line styles, font properties, axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB users.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#np.ndarray?
a = np.random.randint(0,10,10)
a
array([4, 1, 8, 6, 5, 4, 6, 5, 7, 5])
type(a)
numpy.ndarray
a.shape
(10,)
a.size
10
a.dtype
dtype('int64')
a.astype('int8')
array([4, 1, 8, 6, 5, 4, 6, 5, 7, 5], dtype=int8)
a.dtype
dtype('int64')
2**64
18446744073709551616
2**8
256
a.itemsize
8
a = np.random.random((10,10))
a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
[0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
[0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
[0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
[0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
[0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
[0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
[0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
[0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
[0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062 ]])
a.shape
(10, 10)
a[0]
array([0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585])
a[0,0]
0.9372524570452767
a[:,0]
array([0.93725246, 0.08302075, 0.81024205, 0.72307131, 0.22072178,
0.98403695, 0.08661299, 0.13483696, 0.57092602, 0.98603552])
a[0:3]
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
[0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
[0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739]])
a[:,0:3]
array([[0.93725246, 0.15668888, 0.36611166],
[0.08302075, 0.82469381, 0.57287118],
[0.81024205, 0.68601501, 0.63991827],
[0.72307131, 0.50361642, 0.32121823],
[0.22072178, 0.14158838, 0.70495497],
[0.98403695, 0.65750484, 0.95815873],
[0.08661299, 0.54225296, 0.99453582],
[0.13483696, 0.29410672, 0.39846689],
[0.57092602, 0.36669827, 0.01014502],
[0.98603552, 0.29201346, 0.25903222]])
a[0:3,0:3]
array([[0.93725246, 0.15668888, 0.36611166],
[0.08302075, 0.82469381, 0.57287118],
[0.81024205, 0.68601501, 0.63991827]])
a.mean()
0.5534435458189786
a.min(axis=1)
array([0.15668888, 0.06703322, 0.11388148, 0.32121823, 0.00207674,
0.06120255, 0.08661299, 0.06508179, 0.01014502, 0.14021301])
a.min(axis=0)
array([0.08302075, 0.14158838, 0.01014502, 0.06120255, 0.00207674,
0.25755703, 0.01684733, 0.3789339 , 0.25459632, 0.06508179])
plt.plot(a)
[<matplotlib.lines.Line2D at 0x7f6239a52c70>,
<matplotlib.lines.Line2D at 0x7f6239a52ca0>,
<matplotlib.lines.Line2D at 0x7f6239a52e20>,
<matplotlib.lines.Line2D at 0x7f6239a52ee0>,
<matplotlib.lines.Line2D at 0x7f6239a52fa0>,
<matplotlib.lines.Line2D at 0x7f6239a640a0>,
<matplotlib.lines.Line2D at 0x7f6239a64160>,
<matplotlib.lines.Line2D at 0x7f6239a64220>,
<matplotlib.lines.Line2D at 0x7f6239a642e0>,
<matplotlib.lines.Line2D at 0x7f6239a643a0>]
plt.plot(a[0])
[<matplotlib.lines.Line2D at 0x7f6230154520>]
plt.imshow(a)
<matplotlib.image.AxesImage at 0x7f62300c54f0>
a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
[0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
[0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
[0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
[0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
[0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
[0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
[0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
[0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
[0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062 ]])
a > 0.5
array([[ True, False, False, False, True, True, True, True, False,
True],
[False, True, True, True, False, False, False, True, True,
False],
[ True, True, True, False, True, False, True, True, True,
True],
[ True, True, False, False, True, True, True, True, True,
True],
[False, False, True, False, False, True, True, False, False,
False],
[ True, True, True, False, True, True, True, True, True,
True],
[False, True, True, False, True, False, False, False, True,
False],
[False, False, False, True, True, True, True, True, True,
False],
[ True, False, False, False, False, True, False, True, True,
True],
[ True, False, False, False, False, True, True, True, True,
True]])
idx = (a > 0.5)
a[idx]
array([0.93725246, 0.729094 , 0.83046116, 0.55376036, 0.95884927,
0.71853585, 0.82469381, 0.57287118, 0.70806906, 0.91474639,
0.64656203, 0.81024205, 0.68601501, 0.63991827, 0.88810797,
0.69543896, 0.97644668, 0.6011786 , 0.86319739, 0.72307131,
0.50361642, 0.54175156, 0.85948579, 0.67301454, 0.95202843,
0.98558777, 0.92096823, 0.70495497, 0.77898224, 0.94767885,
0.98403695, 0.65750484, 0.95815873, 0.70874163, 0.83894932,
0.8006265 , 0.77455562, 0.52743587, 0.75654806, 0.54225296,
0.99453582, 0.91009936, 0.52389196, 0.84681589, 0.67671481,
0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.57092602,
0.76761243, 0.56304474, 0.51408509, 0.83939443, 0.98603552,
0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062 ])
a.shape
(10, 10)
a[idx].shape
(60,)
idx.nonzero()[0].size
60
idx = (a > 0.5) & (a < 0.7)
#idx = (a < 0.5) | (a > 0.9)
idx
array([[False, False, False, False, False, False, True, False, False,
False],
[False, False, True, False, False, False, False, False, True,
False],
[False, True, True, False, False, False, True, False, True,
False],
[False, True, False, False, True, False, True, False, False,
False],
[False, False, False, False, False, False, False, False, False,
False],
[False, True, False, False, False, False, False, False, True,
False],
[False, True, False, False, False, False, False, False, True,
False],
[False, False, False, False, True, False, False, True, True,
False],
[ True, False, False, False, False, False, False, True, True,
False],
[False, False, False, False, False, False, False, True, True,
True]])
a[idx]
array([0.55376036, 0.57287118, 0.64656203, 0.68601501, 0.63991827,
0.69543896, 0.6011786 , 0.50361642, 0.54175156, 0.67301454,
0.65750484, 0.52743587, 0.54225296, 0.52389196, 0.67671481,
0.66011626, 0.51693959, 0.57092602, 0.56304474, 0.51408509,
0.60820336, 0.65643397, 0.640062 ])
a[idx].shape
(23,)
~idx
array([[ True, True, True, True, True, True, False, True, True,
True],
[ True, True, False, True, True, True, True, True, False,
True],
[ True, False, False, True, True, True, False, True, False,
True],
[ True, False, True, True, False, True, False, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, False, True, True, True, True, True, True, False,
True],
[ True, False, True, True, True, True, True, True, False,
True],
[ True, True, True, True, False, True, True, False, False,
True],
[False, True, True, True, True, True, True, False, False,
True],
[ True, True, True, True, True, True, True, False, False,
False]])
a[~idx]
array([0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.95884927, 0.29762467, 0.71853585, 0.08302075,
0.82469381, 0.70806906, 0.29734792, 0.45314182, 0.16445973,
0.91474639, 0.06703322, 0.81024205, 0.11388148, 0.88810797,
0.25755703, 0.97644668, 0.86319739, 0.72307131, 0.32121823,
0.39273795, 0.85948579, 0.95202843, 0.98558777, 0.92096823,
0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945,
0.98403695, 0.95815873, 0.06120255, 0.70874163, 0.83894932,
0.8006265 , 0.77455562, 0.75654806, 0.08661299, 0.99453582,
0.49692316, 0.91009936, 0.27296071, 0.46314554, 0.3789339 ,
0.24766733, 0.13483696, 0.29410672, 0.39846689, 0.84681589,
0.84317288, 0.85344483, 0.06508179, 0.36669827, 0.01014502,
0.48240653, 0.37714147, 0.76761243, 0.01684733, 0.83939443,
0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
0.99199766, 0.87244072])
a[~idx].shape
(77,)
Masked array¶
a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094 ,
0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
[0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
[0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
[0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
[0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
[0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
[0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
[0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
[0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
[0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062 ]])
idx
array([[False, False, False, False, False, False, True, False, False,
False],
[False, False, True, False, False, False, False, False, True,
False],
[False, True, True, False, False, False, True, False, True,
False],
[False, True, False, False, True, False, True, False, False,
False],
[False, False, False, False, False, False, False, False, False,
False],
[False, True, False, False, False, False, False, False, True,
False],
[False, True, False, False, False, False, False, False, True,
False],
[False, False, False, False, True, False, False, True, True,
False],
[ True, False, False, False, False, False, False, True, True,
False],
[False, False, False, False, False, False, False, True, True,
True]])
ma = np.ma.array(a, mask=~idx)
ma
masked_array(
data=[[--, --, --, --, --, --, 0.5537603625060096, --, --, --],
[--, --, 0.5728711811469509, --, --, --, --, --,
0.6465620326513881, --],
[--, 0.6860150052162564, 0.6399182670598812, --, --, --,
0.6954389578430052, --, 0.6011785952124488, --],
[--, 0.5036164236165882, --, --, 0.5417515578959615, --,
0.67301453764201, --, --, --],
[--, --, --, --, --, --, --, --, --, --],
[--, 0.6575048391787636, --, --, --, --, --, --,
0.5274358710655832, --],
[--, 0.5422529561477778, --, --, --, --, --, --,
0.5238919606546085, --],
[--, --, --, --, 0.6767148052677167, --, --, 0.6601162558250676,
0.5169395925974186, --],
[0.5709260243416924, --, --, --, --, --, --, 0.5630447415496301,
0.5140850876133204, --],
[--, --, --, --, --, --, --, 0.60820335527965,
0.6564339696072983, 0.6400620038099053]],
mask=[[ True, True, True, True, True, True, False, True, True,
True],
[ True, True, False, True, True, True, True, True, False,
True],
[ True, False, False, True, True, True, False, True, False,
True],
[ True, False, True, True, False, True, False, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
[ True, False, True, True, True, True, True, True, False,
True],
[ True, False, True, True, True, True, True, True, False,
True],
[ True, True, True, True, False, True, True, False, False,
True],
[False, True, True, True, True, True, True, False, False,
True],
[ True, True, True, True, True, True, True, False, False,
False]],
fill_value=1e+20)
ma.mean()
0.5987712340751709
plt.imshow(ma)
<matplotlib.image.AxesImage at 0x7f62300ac0d0>
Pandas¶
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
df = pd.DataFrame(a)
df
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.937252 | 0.156689 | 0.366112 | 0.285216 | 0.729094 | 0.830461 | 0.553760 | 0.958849 | 0.297625 | 0.718536 |
| 1 | 0.083021 | 0.824694 | 0.572871 | 0.708069 | 0.297348 | 0.453142 | 0.164460 | 0.914746 | 0.646562 | 0.067033 |
| 2 | 0.810242 | 0.686015 | 0.639918 | 0.113881 | 0.888108 | 0.257557 | 0.695439 | 0.976447 | 0.601179 | 0.863197 |
| 3 | 0.723071 | 0.503616 | 0.321218 | 0.392738 | 0.541752 | 0.859486 | 0.673015 | 0.952028 | 0.985588 | 0.920968 |
| 4 | 0.220722 | 0.141588 | 0.704955 | 0.358575 | 0.002077 | 0.778982 | 0.947679 | 0.439319 | 0.254596 | 0.129269 |
| 5 | 0.984037 | 0.657505 | 0.958159 | 0.061203 | 0.708742 | 0.838949 | 0.800627 | 0.774556 | 0.527436 | 0.756548 |
| 6 | 0.086613 | 0.542253 | 0.994536 | 0.496923 | 0.910099 | 0.272961 | 0.463146 | 0.378934 | 0.523892 | 0.247667 |
| 7 | 0.134837 | 0.294107 | 0.398467 | 0.846816 | 0.676715 | 0.843173 | 0.853445 | 0.660116 | 0.516940 | 0.065082 |
| 8 | 0.570926 | 0.366698 | 0.010145 | 0.482407 | 0.377141 | 0.767612 | 0.016847 | 0.563045 | 0.514085 | 0.839394 |
| 9 | 0.986036 | 0.292013 | 0.259032 | 0.140213 | 0.166375 | 0.991998 | 0.872441 | 0.608203 | 0.656434 | 0.640062 |
df.index = ['a','b','c','d','e','f','g','h','i','j']
df
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| a | 0.937252 | 0.156689 | 0.366112 | 0.285216 | 0.729094 | 0.830461 | 0.553760 | 0.958849 | 0.297625 | 0.718536 |
| b | 0.083021 | 0.824694 | 0.572871 | 0.708069 | 0.297348 | 0.453142 | 0.164460 | 0.914746 | 0.646562 | 0.067033 |
| c | 0.810242 | 0.686015 | 0.639918 | 0.113881 | 0.888108 | 0.257557 | 0.695439 | 0.976447 | 0.601179 | 0.863197 |
| d | 0.723071 | 0.503616 | 0.321218 | 0.392738 | 0.541752 | 0.859486 | 0.673015 | 0.952028 | 0.985588 | 0.920968 |
| e | 0.220722 | 0.141588 | 0.704955 | 0.358575 | 0.002077 | 0.778982 | 0.947679 | 0.439319 | 0.254596 | 0.129269 |
| f | 0.984037 | 0.657505 | 0.958159 | 0.061203 | 0.708742 | 0.838949 | 0.800627 | 0.774556 | 0.527436 | 0.756548 |
| g | 0.086613 | 0.542253 | 0.994536 | 0.496923 | 0.910099 | 0.272961 | 0.463146 | 0.378934 | 0.523892 | 0.247667 |
| h | 0.134837 | 0.294107 | 0.398467 | 0.846816 | 0.676715 | 0.843173 | 0.853445 | 0.660116 | 0.516940 | 0.065082 |
| i | 0.570926 | 0.366698 | 0.010145 | 0.482407 | 0.377141 | 0.767612 | 0.016847 | 0.563045 | 0.514085 | 0.839394 |
| j | 0.986036 | 0.292013 | 0.259032 | 0.140213 | 0.166375 | 0.991998 | 0.872441 | 0.608203 | 0.656434 | 0.640062 |
df.mean()
0 0.553676
1 0.446518
2 0.522541
3 0.388604
4 0.529745
5 0.689432
6 0.604086
7 0.722624
8 0.552434
9 0.524776
dtype: float64
df.mean(axis=1)
a 0.583359
b 0.473195
c 0.653198
d 0.687348
e 0.397776
f 0.706776
g 0.491702
h 0.528970
i 0.450830
j 0.561281
dtype: float64
Reading files¶
Most of the time, you will read in tabular data and let Pandas do the work
csv_fn = '../01_Shell_Github/data/GLAH14_tllz_conus_lulcfilt_demfilt.csv'
!head $csv_fn
decyear,ordinal,lat,lon,glas_z,dem_z,dem_z_std,lulc
2003.13957078,731266.9433448168,44.157897,-105.356562,1398.51,1400.52,0.33,31
2003.13957081,731266.9433462636,44.150175,-105.358116,1387.11,1384.64,0.43,31
2003.13957081,731266.9433465529,44.148632,-105.358427,1392.83,1383.49,0.28,31
2003.13957081,731266.9433468423,44.147087,-105.358738,1384.24,1382.85,0.84,31
2003.13957081,731266.9433471316,44.145542,-105.359048,1369.21,1380.24,1.73,31
2003.13957081,731266.9433474210,44.143996,-105.359359,1366.60,1375.23,1.60,31
2003.13957081,731266.9433506038,44.126969,-105.362876,1355.14,1379.38,2.17,31
2003.13957084,731266.9433604418,44.074358,-105.373549,1369.53,1391.71,2.88,31
2003.13957084,731266.9433607311,44.072806,-105.373864,1380.02,1387.79,0.45,31
pd.read_csv(csv_fn)
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 1 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
| 2 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 |
| 3 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 |
| 4 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 65231 | 2009.775995 | 733691.238340 | 37.896222 | -117.044399 | 1556.16 | 1556.43 | 0.00 | 31 |
| 65232 | 2009.775995 | 733691.238340 | 37.897769 | -117.044675 | 1556.02 | 1556.43 | 0.00 | 31 |
| 65233 | 2009.775995 | 733691.238340 | 37.899319 | -117.044952 | 1556.19 | 1556.44 | 0.00 | 31 |
| 65234 | 2009.775995 | 733691.238340 | 37.900869 | -117.045230 | 1556.18 | 1556.44 | 0.00 | 31 |
| 65235 | 2009.775995 | 733691.238341 | 37.902420 | -117.045508 | 1556.32 | 1556.44 | 0.00 | 31 |
65236 rows × 8 columns
glas_df = pd.read_csv(csv_fn)
#Multiply index to demonstrate difference between loc and iloc
glas_df.set_index(glas_df.index*10, inplace=True)
glas_df
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 10 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
| 20 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 |
| 30 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 |
| 40 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 652310 | 2009.775995 | 733691.238340 | 37.896222 | -117.044399 | 1556.16 | 1556.43 | 0.00 | 31 |
| 652320 | 2009.775995 | 733691.238340 | 37.897769 | -117.044675 | 1556.02 | 1556.43 | 0.00 | 31 |
| 652330 | 2009.775995 | 733691.238340 | 37.899319 | -117.044952 | 1556.19 | 1556.44 | 0.00 | 31 |
| 652340 | 2009.775995 | 733691.238340 | 37.900869 | -117.045230 | 1556.18 | 1556.44 | 0.00 | 31 |
| 652350 | 2009.775995 | 733691.238341 | 37.902420 | -117.045508 | 1556.32 | 1556.44 | 0.00 | 31 |
65236 rows × 8 columns
glas_df.describe()
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| count | 65236.000000 | 65236.000000 | 65236.000000 | 65236.000000 | 65236.000000 | 65236.000000 | 65236.000000 | 65236.000000 |
| mean | 2005.945322 | 732291.890372 | 40.946798 | -115.040612 | 1791.494167 | 1792.260964 | 5.504748 | 30.339444 |
| std | 1.729573 | 631.766682 | 3.590476 | 5.465065 | 1037.183482 | 1037.925371 | 7.518558 | 3.480576 |
| min | 2003.139571 | 731266.943345 | 34.999455 | -124.482406 | -115.550000 | -114.570000 | 0.000000 | 12.000000 |
| 25% | 2004.444817 | 731743.803182 | 38.101451 | -119.257599 | 1166.970000 | 1168.240000 | 0.070000 | 31.000000 |
| 50% | 2005.846896 | 732256.116938 | 39.884541 | -115.686241 | 1555.730000 | 1556.380000 | 1.350000 | 31.000000 |
| 75% | 2007.223249 | 732758.486046 | 43.453565 | -109.816475 | 2399.355000 | 2400.072500 | 9.530000 | 31.000000 |
| max | 2009.775995 | 733691.238341 | 48.999727 | -104.052336 | 4340.310000 | 4252.940000 | 49.900000 | 31.000000 |
Indexing and selecting¶
glas_df.loc[0]
decyear 2003.139571
ordinal 731266.943345
lat 44.157897
lon -105.356562
glas_z 1398.510000
dem_z 1400.520000
dem_z_std 0.330000
lulc 31.000000
Name: 0, dtype: float64
glas_df.iloc[0]
decyear 2003.139571
ordinal 731266.943345
lat 44.157897
lon -105.356562
glas_z 1398.510000
dem_z 1400.520000
dem_z_std 0.330000
lulc 31.000000
Name: 0, dtype: float64
glas_df.loc[0:10]
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 10 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
glas_df.iloc[0:10]
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 10 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
| 20 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 |
| 30 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 |
| 40 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 |
| 50 | 2003.139571 | 731266.943347 | 44.143996 | -105.359359 | 1366.60 | 1375.23 | 1.60 | 31 |
| 60 | 2003.139571 | 731266.943351 | 44.126969 | -105.362876 | 1355.14 | 1379.38 | 2.17 | 31 |
| 70 | 2003.139571 | 731266.943360 | 44.074358 | -105.373549 | 1369.53 | 1391.71 | 2.88 | 31 |
| 80 | 2003.139571 | 731266.943361 | 44.072806 | -105.373864 | 1380.02 | 1387.79 | 0.45 | 31 |
| 90 | 2003.139571 | 731266.943361 | 44.071256 | -105.374177 | 1391.47 | 1396.90 | 1.56 | 31 |
glas_df.index
RangeIndex(start=0, stop=652360, step=10)
glas_df.values
array([[2.00313957e+03, 7.31266943e+05, 4.41578970e+01, ...,
1.40052000e+03, 3.30000000e-01, 3.10000000e+01],
[2.00313957e+03, 7.31266943e+05, 4.41501750e+01, ...,
1.38464000e+03, 4.30000000e-01, 3.10000000e+01],
[2.00313957e+03, 7.31266943e+05, 4.41486320e+01, ...,
1.38349000e+03, 2.80000000e-01, 3.10000000e+01],
...,
[2.00977600e+03, 7.33691238e+05, 3.78993190e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
[2.00977600e+03, 7.33691238e+05, 3.79008690e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
[2.00977600e+03, 7.33691238e+05, 3.79024200e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01]])
glas_df.columns
Index(['decyear', 'ordinal', 'lat', 'lon', 'glas_z', 'dem_z', 'dem_z_std',
'lulc'],
dtype='object')
glas_df['glas_z']
0 1398.51
10 1387.11
20 1392.83
30 1384.24
40 1369.21
...
652310 1556.16
652320 1556.02
652330 1556.19
652340 1556.18
652350 1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.glas_z
0 1398.51
10 1387.11
20 1392.83
30 1384.24
40 1369.21
...
652310 1556.16
652320 1556.02
652330 1556.19
652340 1556.18
652350 1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.iloc[:,4]
0 1398.51
10 1387.11
20 1392.83
30 1384.24
40 1369.21
...
652310 1556.16
652320 1556.02
652330 1556.19
652340 1556.18
652350 1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.loc[:,'glas_z']
0 1398.51
10 1387.11
20 1392.83
30 1384.24
40 1369.21
...
652310 1556.16
652320 1556.02
652330 1556.19
652340 1556.18
652350 1556.32
Name: glas_z, Length: 65236, dtype: float64
idx2 = (glas_df['lulc'] == 12)
glas_df.shape
(65236, 8)
glas_df[idx2]
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 230 | 2003.139573 | 731266.944184 | 39.669291 | -106.225142 | 3505.12 | 3508.25 | 5.74 | 12 |
| 300 | 2003.139573 | 731266.944316 | 38.961190 | -106.355153 | 4046.47 | 4047.25 | 7.14 | 12 |
| 4890 | 2003.147846 | 731269.963718 | 48.587233 | -113.484046 | 2135.76 | 2123.37 | 1.18 | 12 |
| 4920 | 2003.147846 | 731269.963811 | 48.091352 | -113.595790 | 1632.52 | 1615.77 | 11.43 | 12 |
| 7560 | 2003.157366 | 731273.438572 | 43.897412 | -114.457131 | 2886.39 | 2889.82 | 20.31 | 12 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 647240 | 2009.764964 | 733687.211708 | 40.689722 | -105.918309 | 3267.33 | 3267.62 | 1.83 | 12 |
| 647250 | 2009.764964 | 733687.211709 | 40.694371 | -105.919164 | 3235.77 | 3238.94 | 3.78 | 12 |
| 649830 | 2009.771998 | 733689.779258 | 47.910365 | -123.628017 | 1671.86 | 1711.73 | 8.44 | 12 |
| 649840 | 2009.771998 | 733689.779258 | 47.908820 | -123.628357 | 1737.70 | 1776.17 | 7.70 | 12 |
| 649850 | 2009.771998 | 733689.779258 | 47.907275 | -123.628697 | 1782.52 | 1828.93 | 4.41 | 12 |
2268 rows × 8 columns
glas_df[idx2].shape
(2268, 8)
glas_df[idx2].mean()
decyear 2006.008627
ordinal 732315.035881
lat 43.065223
lon -112.936499
glas_z 2918.746261
dem_z 2920.785754
dem_z_std 9.719951
lulc 12.000000
dtype: float64
glas_df.groupby('lulc')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f62bffe0d60>
glas_df.groupby('lulc').mean()
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | |
|---|---|---|---|---|---|---|---|
| lulc | |||||||
| 12 | 2006.008627 | 732315.035881 | 43.065223 | -112.936499 | 2918.746261 | 2920.785754 | 9.719951 |
| 31 | 2005.943042 | 732291.056710 | 40.870496 | -115.116398 | 1750.892469 | 1751.613426 | 5.352924 |
import seaborn as sns
planets = sns.load_dataset('planets')
planets
| method | number | orbital_period | mass | distance | year | |
|---|---|---|---|---|---|---|
| 0 | Radial Velocity | 1 | 269.300000 | 7.10 | 77.40 | 2006 |
| 1 | Radial Velocity | 1 | 874.774000 | 2.21 | 56.95 | 2008 |
| 2 | Radial Velocity | 1 | 763.000000 | 2.60 | 19.84 | 2011 |
| 3 | Radial Velocity | 1 | 326.030000 | 19.40 | 110.62 | 2007 |
| 4 | Radial Velocity | 1 | 516.220000 | 10.50 | 119.47 | 2009 |
| ... | ... | ... | ... | ... | ... | ... |
| 1030 | Transit | 1 | 3.941507 | NaN | 172.00 | 2006 |
| 1031 | Transit | 1 | 2.615864 | NaN | 148.00 | 2007 |
| 1032 | Transit | 1 | 3.191524 | NaN | 174.00 | 2007 |
| 1033 | Transit | 1 | 4.125083 | NaN | 293.00 | 2008 |
| 1034 | Transit | 1 | 4.187757 | NaN | 260.00 | 2008 |
1035 rows × 6 columns
planets.groupby('method')['orbital_period'].median()
method
Astrometry 631.180000
Eclipse Timing Variations 4343.500000
Imaging 27500.000000
Microlensing 3300.000000
Orbital Brightness Modulation 0.342887
Pulsar Timing 66.541900
Pulsation Timing Variations 1170.000000
Radial Velocity 360.200000
Transit 5.714932
Transit Timing Variations 57.011000
Name: orbital_period, dtype: float64