Demo: NumPy, Pandas, Matplotlib

UW Geospatial Data Analysis
CEE498/CEWA599
David Shean

NumPy

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code. For examples, see the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined with IPython. For the power user, you have full control of line styles, font properties, axes properties, etc, via an object oriented interface or via a set of functions familiar to MATLAB users.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#np.ndarray?
a = np.random.randint(0,10,10)
a
array([4, 1, 8, 6, 5, 4, 6, 5, 7, 5])
type(a)
numpy.ndarray
a.shape
(10,)
a.size
10
a.dtype
dtype('int64')
a.astype('int8')
array([4, 1, 8, 6, 5, 4, 6, 5, 7, 5], dtype=int8)
a.dtype
dtype('int64')
2**64
18446744073709551616
2**8
256
a.itemsize
8
a = np.random.random((10,10))
a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
        0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
       [0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
        0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
       [0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
        0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
       [0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
        0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
       [0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
        0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
       [0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
        0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
       [0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
        0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
       [0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
        0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
       [0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
        0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
       [0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
        0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062  ]])
a.shape
(10, 10)
a[0]
array([0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
       0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585])
a[0,0]
0.9372524570452767
a[:,0]
array([0.93725246, 0.08302075, 0.81024205, 0.72307131, 0.22072178,
       0.98403695, 0.08661299, 0.13483696, 0.57092602, 0.98603552])
a[0:3]
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
        0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
       [0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
        0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
       [0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
        0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739]])
a[:,0:3]
array([[0.93725246, 0.15668888, 0.36611166],
       [0.08302075, 0.82469381, 0.57287118],
       [0.81024205, 0.68601501, 0.63991827],
       [0.72307131, 0.50361642, 0.32121823],
       [0.22072178, 0.14158838, 0.70495497],
       [0.98403695, 0.65750484, 0.95815873],
       [0.08661299, 0.54225296, 0.99453582],
       [0.13483696, 0.29410672, 0.39846689],
       [0.57092602, 0.36669827, 0.01014502],
       [0.98603552, 0.29201346, 0.25903222]])
a[0:3,0:3]
array([[0.93725246, 0.15668888, 0.36611166],
       [0.08302075, 0.82469381, 0.57287118],
       [0.81024205, 0.68601501, 0.63991827]])
a.mean()
0.5534435458189786
a.min(axis=1)
array([0.15668888, 0.06703322, 0.11388148, 0.32121823, 0.00207674,
       0.06120255, 0.08661299, 0.06508179, 0.01014502, 0.14021301])
a.min(axis=0)
array([0.08302075, 0.14158838, 0.01014502, 0.06120255, 0.00207674,
       0.25755703, 0.01684733, 0.3789339 , 0.25459632, 0.06508179])
plt.plot(a)
[<matplotlib.lines.Line2D at 0x7f6239a52c70>,
 <matplotlib.lines.Line2D at 0x7f6239a52ca0>,
 <matplotlib.lines.Line2D at 0x7f6239a52e20>,
 <matplotlib.lines.Line2D at 0x7f6239a52ee0>,
 <matplotlib.lines.Line2D at 0x7f6239a52fa0>,
 <matplotlib.lines.Line2D at 0x7f6239a640a0>,
 <matplotlib.lines.Line2D at 0x7f6239a64160>,
 <matplotlib.lines.Line2D at 0x7f6239a64220>,
 <matplotlib.lines.Line2D at 0x7f6239a642e0>,
 <matplotlib.lines.Line2D at 0x7f6239a643a0>]
../../_images/03_NumPy_Pandas_Demo_27_1.png
plt.plot(a[0])
[<matplotlib.lines.Line2D at 0x7f6230154520>]
../../_images/03_NumPy_Pandas_Demo_28_1.png
plt.imshow(a)
<matplotlib.image.AxesImage at 0x7f62300c54f0>
../../_images/03_NumPy_Pandas_Demo_29_1.png
a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
        0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
       [0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
        0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
       [0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
        0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
       [0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
        0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
       [0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
        0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
       [0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
        0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
       [0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
        0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
       [0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
        0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
       [0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
        0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
       [0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
        0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062  ]])
a > 0.5
array([[ True, False, False, False,  True,  True,  True,  True, False,
         True],
       [False,  True,  True,  True, False, False, False,  True,  True,
        False],
       [ True,  True,  True, False,  True, False,  True,  True,  True,
         True],
       [ True,  True, False, False,  True,  True,  True,  True,  True,
         True],
       [False, False,  True, False, False,  True,  True, False, False,
        False],
       [ True,  True,  True, False,  True,  True,  True,  True,  True,
         True],
       [False,  True,  True, False,  True, False, False, False,  True,
        False],
       [False, False, False,  True,  True,  True,  True,  True,  True,
        False],
       [ True, False, False, False, False,  True, False,  True,  True,
         True],
       [ True, False, False, False, False,  True,  True,  True,  True,
         True]])
idx = (a > 0.5)
a[idx]
array([0.93725246, 0.729094  , 0.83046116, 0.55376036, 0.95884927,
       0.71853585, 0.82469381, 0.57287118, 0.70806906, 0.91474639,
       0.64656203, 0.81024205, 0.68601501, 0.63991827, 0.88810797,
       0.69543896, 0.97644668, 0.6011786 , 0.86319739, 0.72307131,
       0.50361642, 0.54175156, 0.85948579, 0.67301454, 0.95202843,
       0.98558777, 0.92096823, 0.70495497, 0.77898224, 0.94767885,
       0.98403695, 0.65750484, 0.95815873, 0.70874163, 0.83894932,
       0.8006265 , 0.77455562, 0.52743587, 0.75654806, 0.54225296,
       0.99453582, 0.91009936, 0.52389196, 0.84681589, 0.67671481,
       0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.57092602,
       0.76761243, 0.56304474, 0.51408509, 0.83939443, 0.98603552,
       0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062  ])
a.shape
(10, 10)
a[idx].shape
(60,)
idx.nonzero()[0].size
60
idx = (a > 0.5) & (a < 0.7)
#idx = (a < 0.5) | (a > 0.9)
idx
array([[False, False, False, False, False, False,  True, False, False,
        False],
       [False, False,  True, False, False, False, False, False,  True,
        False],
       [False,  True,  True, False, False, False,  True, False,  True,
        False],
       [False,  True, False, False,  True, False,  True, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False,  True, False, False, False, False, False, False,  True,
        False],
       [False,  True, False, False, False, False, False, False,  True,
        False],
       [False, False, False, False,  True, False, False,  True,  True,
        False],
       [ True, False, False, False, False, False, False,  True,  True,
        False],
       [False, False, False, False, False, False, False,  True,  True,
         True]])
a[idx]
array([0.55376036, 0.57287118, 0.64656203, 0.68601501, 0.63991827,
       0.69543896, 0.6011786 , 0.50361642, 0.54175156, 0.67301454,
       0.65750484, 0.52743587, 0.54225296, 0.52389196, 0.67671481,
       0.66011626, 0.51693959, 0.57092602, 0.56304474, 0.51408509,
       0.60820336, 0.65643397, 0.640062  ])
a[idx].shape
(23,)
~idx
array([[ True,  True,  True,  True,  True,  True, False,  True,  True,
         True],
       [ True,  True, False,  True,  True,  True,  True,  True, False,
         True],
       [ True, False, False,  True,  True,  True, False,  True, False,
         True],
       [ True, False,  True,  True, False,  True, False,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True, False,  True,  True,  True,  True,  True,  True, False,
         True],
       [ True, False,  True,  True,  True,  True,  True,  True, False,
         True],
       [ True,  True,  True,  True, False,  True,  True, False, False,
         True],
       [False,  True,  True,  True,  True,  True,  True, False, False,
         True],
       [ True,  True,  True,  True,  True,  True,  True, False, False,
        False]])
a[~idx]
array([0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
       0.83046116, 0.95884927, 0.29762467, 0.71853585, 0.08302075,
       0.82469381, 0.70806906, 0.29734792, 0.45314182, 0.16445973,
       0.91474639, 0.06703322, 0.81024205, 0.11388148, 0.88810797,
       0.25755703, 0.97644668, 0.86319739, 0.72307131, 0.32121823,
       0.39273795, 0.85948579, 0.95202843, 0.98558777, 0.92096823,
       0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
       0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945,
       0.98403695, 0.95815873, 0.06120255, 0.70874163, 0.83894932,
       0.8006265 , 0.77455562, 0.75654806, 0.08661299, 0.99453582,
       0.49692316, 0.91009936, 0.27296071, 0.46314554, 0.3789339 ,
       0.24766733, 0.13483696, 0.29410672, 0.39846689, 0.84681589,
       0.84317288, 0.85344483, 0.06508179, 0.36669827, 0.01014502,
       0.48240653, 0.37714147, 0.76761243, 0.01684733, 0.83939443,
       0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
       0.99199766, 0.87244072])
a[~idx].shape
(77,)

Masked array

a
array([[0.93725246, 0.15668888, 0.36611166, 0.28521612, 0.729094  ,
        0.83046116, 0.55376036, 0.95884927, 0.29762467, 0.71853585],
       [0.08302075, 0.82469381, 0.57287118, 0.70806906, 0.29734792,
        0.45314182, 0.16445973, 0.91474639, 0.64656203, 0.06703322],
       [0.81024205, 0.68601501, 0.63991827, 0.11388148, 0.88810797,
        0.25755703, 0.69543896, 0.97644668, 0.6011786 , 0.86319739],
       [0.72307131, 0.50361642, 0.32121823, 0.39273795, 0.54175156,
        0.85948579, 0.67301454, 0.95202843, 0.98558777, 0.92096823],
       [0.22072178, 0.14158838, 0.70495497, 0.35857463, 0.00207674,
        0.77898224, 0.94767885, 0.43931853, 0.25459632, 0.12926945],
       [0.98403695, 0.65750484, 0.95815873, 0.06120255, 0.70874163,
        0.83894932, 0.8006265 , 0.77455562, 0.52743587, 0.75654806],
       [0.08661299, 0.54225296, 0.99453582, 0.49692316, 0.91009936,
        0.27296071, 0.46314554, 0.3789339 , 0.52389196, 0.24766733],
       [0.13483696, 0.29410672, 0.39846689, 0.84681589, 0.67671481,
        0.84317288, 0.85344483, 0.66011626, 0.51693959, 0.06508179],
       [0.57092602, 0.36669827, 0.01014502, 0.48240653, 0.37714147,
        0.76761243, 0.01684733, 0.56304474, 0.51408509, 0.83939443],
       [0.98603552, 0.29201346, 0.25903222, 0.14021301, 0.16637503,
        0.99199766, 0.87244072, 0.60820336, 0.65643397, 0.640062  ]])
idx
array([[False, False, False, False, False, False,  True, False, False,
        False],
       [False, False,  True, False, False, False, False, False,  True,
        False],
       [False,  True,  True, False, False, False,  True, False,  True,
        False],
       [False,  True, False, False,  True, False,  True, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False,  True, False, False, False, False, False, False,  True,
        False],
       [False,  True, False, False, False, False, False, False,  True,
        False],
       [False, False, False, False,  True, False, False,  True,  True,
        False],
       [ True, False, False, False, False, False, False,  True,  True,
        False],
       [False, False, False, False, False, False, False,  True,  True,
         True]])
ma = np.ma.array(a, mask=~idx)
ma
masked_array(
  data=[[--, --, --, --, --, --, 0.5537603625060096, --, --, --],
        [--, --, 0.5728711811469509, --, --, --, --, --,
         0.6465620326513881, --],
        [--, 0.6860150052162564, 0.6399182670598812, --, --, --,
         0.6954389578430052, --, 0.6011785952124488, --],
        [--, 0.5036164236165882, --, --, 0.5417515578959615, --,
         0.67301453764201, --, --, --],
        [--, --, --, --, --, --, --, --, --, --],
        [--, 0.6575048391787636, --, --, --, --, --, --,
         0.5274358710655832, --],
        [--, 0.5422529561477778, --, --, --, --, --, --,
         0.5238919606546085, --],
        [--, --, --, --, 0.6767148052677167, --, --, 0.6601162558250676,
         0.5169395925974186, --],
        [0.5709260243416924, --, --, --, --, --, --, 0.5630447415496301,
         0.5140850876133204, --],
        [--, --, --, --, --, --, --, 0.60820335527965,
         0.6564339696072983, 0.6400620038099053]],
  mask=[[ True,  True,  True,  True,  True,  True, False,  True,  True,
          True],
        [ True,  True, False,  True,  True,  True,  True,  True, False,
          True],
        [ True, False, False,  True,  True,  True, False,  True, False,
          True],
        [ True, False,  True,  True, False,  True, False,  True,  True,
          True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True],
        [ True, False,  True,  True,  True,  True,  True,  True, False,
          True],
        [ True, False,  True,  True,  True,  True,  True,  True, False,
          True],
        [ True,  True,  True,  True, False,  True,  True, False, False,
          True],
        [False,  True,  True,  True,  True,  True,  True, False, False,
          True],
        [ True,  True,  True,  True,  True,  True,  True, False, False,
         False]],
  fill_value=1e+20)
ma.mean()
0.5987712340751709
plt.imshow(ma)
<matplotlib.image.AxesImage at 0x7f62300ac0d0>
../../_images/03_NumPy_Pandas_Demo_50_1.png

Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

df = pd.DataFrame(a)
df
0 1 2 3 4 5 6 7 8 9
0 0.937252 0.156689 0.366112 0.285216 0.729094 0.830461 0.553760 0.958849 0.297625 0.718536
1 0.083021 0.824694 0.572871 0.708069 0.297348 0.453142 0.164460 0.914746 0.646562 0.067033
2 0.810242 0.686015 0.639918 0.113881 0.888108 0.257557 0.695439 0.976447 0.601179 0.863197
3 0.723071 0.503616 0.321218 0.392738 0.541752 0.859486 0.673015 0.952028 0.985588 0.920968
4 0.220722 0.141588 0.704955 0.358575 0.002077 0.778982 0.947679 0.439319 0.254596 0.129269
5 0.984037 0.657505 0.958159 0.061203 0.708742 0.838949 0.800627 0.774556 0.527436 0.756548
6 0.086613 0.542253 0.994536 0.496923 0.910099 0.272961 0.463146 0.378934 0.523892 0.247667
7 0.134837 0.294107 0.398467 0.846816 0.676715 0.843173 0.853445 0.660116 0.516940 0.065082
8 0.570926 0.366698 0.010145 0.482407 0.377141 0.767612 0.016847 0.563045 0.514085 0.839394
9 0.986036 0.292013 0.259032 0.140213 0.166375 0.991998 0.872441 0.608203 0.656434 0.640062
df.index = ['a','b','c','d','e','f','g','h','i','j']
df
0 1 2 3 4 5 6 7 8 9
a 0.937252 0.156689 0.366112 0.285216 0.729094 0.830461 0.553760 0.958849 0.297625 0.718536
b 0.083021 0.824694 0.572871 0.708069 0.297348 0.453142 0.164460 0.914746 0.646562 0.067033
c 0.810242 0.686015 0.639918 0.113881 0.888108 0.257557 0.695439 0.976447 0.601179 0.863197
d 0.723071 0.503616 0.321218 0.392738 0.541752 0.859486 0.673015 0.952028 0.985588 0.920968
e 0.220722 0.141588 0.704955 0.358575 0.002077 0.778982 0.947679 0.439319 0.254596 0.129269
f 0.984037 0.657505 0.958159 0.061203 0.708742 0.838949 0.800627 0.774556 0.527436 0.756548
g 0.086613 0.542253 0.994536 0.496923 0.910099 0.272961 0.463146 0.378934 0.523892 0.247667
h 0.134837 0.294107 0.398467 0.846816 0.676715 0.843173 0.853445 0.660116 0.516940 0.065082
i 0.570926 0.366698 0.010145 0.482407 0.377141 0.767612 0.016847 0.563045 0.514085 0.839394
j 0.986036 0.292013 0.259032 0.140213 0.166375 0.991998 0.872441 0.608203 0.656434 0.640062
df.mean()
0    0.553676
1    0.446518
2    0.522541
3    0.388604
4    0.529745
5    0.689432
6    0.604086
7    0.722624
8    0.552434
9    0.524776
dtype: float64
df.mean(axis=1)
a    0.583359
b    0.473195
c    0.653198
d    0.687348
e    0.397776
f    0.706776
g    0.491702
h    0.528970
i    0.450830
j    0.561281
dtype: float64

Reading files

Most of the time, you will read in tabular data and let Pandas do the work

csv_fn = '../01_Shell_Github/data/GLAH14_tllz_conus_lulcfilt_demfilt.csv'
!head $csv_fn
decyear,ordinal,lat,lon,glas_z,dem_z,dem_z_std,lulc
2003.13957078,731266.9433448168,44.157897,-105.356562,1398.51,1400.52,0.33,31
2003.13957081,731266.9433462636,44.150175,-105.358116,1387.11,1384.64,0.43,31
2003.13957081,731266.9433465529,44.148632,-105.358427,1392.83,1383.49,0.28,31
2003.13957081,731266.9433468423,44.147087,-105.358738,1384.24,1382.85,0.84,31
2003.13957081,731266.9433471316,44.145542,-105.359048,1369.21,1380.24,1.73,31
2003.13957081,731266.9433474210,44.143996,-105.359359,1366.60,1375.23,1.60,31
2003.13957081,731266.9433506038,44.126969,-105.362876,1355.14,1379.38,2.17,31
2003.13957084,731266.9433604418,44.074358,-105.373549,1369.53,1391.71,2.88,31
2003.13957084,731266.9433607311,44.072806,-105.373864,1380.02,1387.79,0.45,31
pd.read_csv(csv_fn)
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
0 2003.139571 731266.943345 44.157897 -105.356562 1398.51 1400.52 0.33 31
1 2003.139571 731266.943346 44.150175 -105.358116 1387.11 1384.64 0.43 31
2 2003.139571 731266.943347 44.148632 -105.358427 1392.83 1383.49 0.28 31
3 2003.139571 731266.943347 44.147087 -105.358738 1384.24 1382.85 0.84 31
4 2003.139571 731266.943347 44.145542 -105.359048 1369.21 1380.24 1.73 31
... ... ... ... ... ... ... ... ...
65231 2009.775995 733691.238340 37.896222 -117.044399 1556.16 1556.43 0.00 31
65232 2009.775995 733691.238340 37.897769 -117.044675 1556.02 1556.43 0.00 31
65233 2009.775995 733691.238340 37.899319 -117.044952 1556.19 1556.44 0.00 31
65234 2009.775995 733691.238340 37.900869 -117.045230 1556.18 1556.44 0.00 31
65235 2009.775995 733691.238341 37.902420 -117.045508 1556.32 1556.44 0.00 31

65236 rows × 8 columns

glas_df = pd.read_csv(csv_fn)
#Multiply index to demonstrate difference between loc and iloc
glas_df.set_index(glas_df.index*10, inplace=True)
glas_df
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
0 2003.139571 731266.943345 44.157897 -105.356562 1398.51 1400.52 0.33 31
10 2003.139571 731266.943346 44.150175 -105.358116 1387.11 1384.64 0.43 31
20 2003.139571 731266.943347 44.148632 -105.358427 1392.83 1383.49 0.28 31
30 2003.139571 731266.943347 44.147087 -105.358738 1384.24 1382.85 0.84 31
40 2003.139571 731266.943347 44.145542 -105.359048 1369.21 1380.24 1.73 31
... ... ... ... ... ... ... ... ...
652310 2009.775995 733691.238340 37.896222 -117.044399 1556.16 1556.43 0.00 31
652320 2009.775995 733691.238340 37.897769 -117.044675 1556.02 1556.43 0.00 31
652330 2009.775995 733691.238340 37.899319 -117.044952 1556.19 1556.44 0.00 31
652340 2009.775995 733691.238340 37.900869 -117.045230 1556.18 1556.44 0.00 31
652350 2009.775995 733691.238341 37.902420 -117.045508 1556.32 1556.44 0.00 31

65236 rows × 8 columns

glas_df.describe()
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
count 65236.000000 65236.000000 65236.000000 65236.000000 65236.000000 65236.000000 65236.000000 65236.000000
mean 2005.945322 732291.890372 40.946798 -115.040612 1791.494167 1792.260964 5.504748 30.339444
std 1.729573 631.766682 3.590476 5.465065 1037.183482 1037.925371 7.518558 3.480576
min 2003.139571 731266.943345 34.999455 -124.482406 -115.550000 -114.570000 0.000000 12.000000
25% 2004.444817 731743.803182 38.101451 -119.257599 1166.970000 1168.240000 0.070000 31.000000
50% 2005.846896 732256.116938 39.884541 -115.686241 1555.730000 1556.380000 1.350000 31.000000
75% 2007.223249 732758.486046 43.453565 -109.816475 2399.355000 2400.072500 9.530000 31.000000
max 2009.775995 733691.238341 48.999727 -104.052336 4340.310000 4252.940000 49.900000 31.000000

Indexing and selecting

glas_df.loc[0]
decyear        2003.139571
ordinal      731266.943345
lat              44.157897
lon            -105.356562
glas_z         1398.510000
dem_z          1400.520000
dem_z_std         0.330000
lulc             31.000000
Name: 0, dtype: float64
glas_df.iloc[0]
decyear        2003.139571
ordinal      731266.943345
lat              44.157897
lon            -105.356562
glas_z         1398.510000
dem_z          1400.520000
dem_z_std         0.330000
lulc             31.000000
Name: 0, dtype: float64
glas_df.loc[0:10]
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
0 2003.139571 731266.943345 44.157897 -105.356562 1398.51 1400.52 0.33 31
10 2003.139571 731266.943346 44.150175 -105.358116 1387.11 1384.64 0.43 31
glas_df.iloc[0:10]
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
0 2003.139571 731266.943345 44.157897 -105.356562 1398.51 1400.52 0.33 31
10 2003.139571 731266.943346 44.150175 -105.358116 1387.11 1384.64 0.43 31
20 2003.139571 731266.943347 44.148632 -105.358427 1392.83 1383.49 0.28 31
30 2003.139571 731266.943347 44.147087 -105.358738 1384.24 1382.85 0.84 31
40 2003.139571 731266.943347 44.145542 -105.359048 1369.21 1380.24 1.73 31
50 2003.139571 731266.943347 44.143996 -105.359359 1366.60 1375.23 1.60 31
60 2003.139571 731266.943351 44.126969 -105.362876 1355.14 1379.38 2.17 31
70 2003.139571 731266.943360 44.074358 -105.373549 1369.53 1391.71 2.88 31
80 2003.139571 731266.943361 44.072806 -105.373864 1380.02 1387.79 0.45 31
90 2003.139571 731266.943361 44.071256 -105.374177 1391.47 1396.90 1.56 31
glas_df.index
RangeIndex(start=0, stop=652360, step=10)
glas_df.values
array([[2.00313957e+03, 7.31266943e+05, 4.41578970e+01, ...,
        1.40052000e+03, 3.30000000e-01, 3.10000000e+01],
       [2.00313957e+03, 7.31266943e+05, 4.41501750e+01, ...,
        1.38464000e+03, 4.30000000e-01, 3.10000000e+01],
       [2.00313957e+03, 7.31266943e+05, 4.41486320e+01, ...,
        1.38349000e+03, 2.80000000e-01, 3.10000000e+01],
       ...,
       [2.00977600e+03, 7.33691238e+05, 3.78993190e+01, ...,
        1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
       [2.00977600e+03, 7.33691238e+05, 3.79008690e+01, ...,
        1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
       [2.00977600e+03, 7.33691238e+05, 3.79024200e+01, ...,
        1.55644000e+03, 0.00000000e+00, 3.10000000e+01]])
glas_df.columns
Index(['decyear', 'ordinal', 'lat', 'lon', 'glas_z', 'dem_z', 'dem_z_std',
       'lulc'],
      dtype='object')
glas_df['glas_z']
0         1398.51
10        1387.11
20        1392.83
30        1384.24
40        1369.21
           ...   
652310    1556.16
652320    1556.02
652330    1556.19
652340    1556.18
652350    1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.glas_z
0         1398.51
10        1387.11
20        1392.83
30        1384.24
40        1369.21
           ...   
652310    1556.16
652320    1556.02
652330    1556.19
652340    1556.18
652350    1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.iloc[:,4]
0         1398.51
10        1387.11
20        1392.83
30        1384.24
40        1369.21
           ...   
652310    1556.16
652320    1556.02
652330    1556.19
652340    1556.18
652350    1556.32
Name: glas_z, Length: 65236, dtype: float64
glas_df.loc[:,'glas_z']
0         1398.51
10        1387.11
20        1392.83
30        1384.24
40        1369.21
           ...   
652310    1556.16
652320    1556.02
652330    1556.19
652340    1556.18
652350    1556.32
Name: glas_z, Length: 65236, dtype: float64
idx2 = (glas_df['lulc'] == 12)
glas_df.shape
(65236, 8)
glas_df[idx2]
decyear ordinal lat lon glas_z dem_z dem_z_std lulc
230 2003.139573 731266.944184 39.669291 -106.225142 3505.12 3508.25 5.74 12
300 2003.139573 731266.944316 38.961190 -106.355153 4046.47 4047.25 7.14 12
4890 2003.147846 731269.963718 48.587233 -113.484046 2135.76 2123.37 1.18 12
4920 2003.147846 731269.963811 48.091352 -113.595790 1632.52 1615.77 11.43 12
7560 2003.157366 731273.438572 43.897412 -114.457131 2886.39 2889.82 20.31 12
... ... ... ... ... ... ... ... ...
647240 2009.764964 733687.211708 40.689722 -105.918309 3267.33 3267.62 1.83 12
647250 2009.764964 733687.211709 40.694371 -105.919164 3235.77 3238.94 3.78 12
649830 2009.771998 733689.779258 47.910365 -123.628017 1671.86 1711.73 8.44 12
649840 2009.771998 733689.779258 47.908820 -123.628357 1737.70 1776.17 7.70 12
649850 2009.771998 733689.779258 47.907275 -123.628697 1782.52 1828.93 4.41 12

2268 rows × 8 columns

glas_df[idx2].shape
(2268, 8)
glas_df[idx2].mean()
decyear        2006.008627
ordinal      732315.035881
lat              43.065223
lon            -112.936499
glas_z         2918.746261
dem_z          2920.785754
dem_z_std         9.719951
lulc             12.000000
dtype: float64
glas_df.groupby('lulc')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f62bffe0d60>
glas_df.groupby('lulc').mean()
decyear ordinal lat lon glas_z dem_z dem_z_std
lulc
12 2006.008627 732315.035881 43.065223 -112.936499 2918.746261 2920.785754 9.719951
31 2005.943042 732291.056710 40.870496 -115.116398 1750.892469 1751.613426 5.352924
import seaborn as sns
planets = sns.load_dataset('planets')
planets
method number orbital_period mass distance year
0 Radial Velocity 1 269.300000 7.10 77.40 2006
1 Radial Velocity 1 874.774000 2.21 56.95 2008
2 Radial Velocity 1 763.000000 2.60 19.84 2011
3 Radial Velocity 1 326.030000 19.40 110.62 2007
4 Radial Velocity 1 516.220000 10.50 119.47 2009
... ... ... ... ... ... ...
1030 Transit 1 3.941507 NaN 172.00 2006
1031 Transit 1 2.615864 NaN 148.00 2007
1032 Transit 1 3.191524 NaN 174.00 2007
1033 Transit 1 4.125083 NaN 293.00 2008
1034 Transit 1 4.187757 NaN 260.00 2008

1035 rows × 6 columns

planets.groupby('method')['orbital_period'].median()
method
Astrometry                         631.180000
Eclipse Timing Variations         4343.500000
Imaging                          27500.000000
Microlensing                      3300.000000
Orbital Brightness Modulation        0.342887
Pulsar Timing                       66.541900
Pulsation Timing Variations       1170.000000
Radial Velocity                    360.200000
Transit                              5.714932
Transit Timing Variations           57.011000
Name: orbital_period, dtype: float64