Demo: Fundamentals, GDAL, rasterio Discussion¶
Introduction¶
What is a raster?¶
Raster data sources¶
Satellite imagery
Gridded model output
Interpolated vector data
Raster fundamentals (interactive discussion)¶
Dimensions (width [columns] and height [rows] in pixels)¶
CRS (coordinate system)¶
Extent (bounds)¶
Resolution (pixel size)¶
Data type (bit depth)¶
Number of bands¶
NoData values¶
NumPy masked arrays vs. float/np.nan
Overviews¶
GDAL (Geospatial Data Abstraction Library) and Rasterio¶
GDAL is a powerful and mature library for reading, writing and warping raster datasets, written in C++ with bindings to other languages. There are a variety of geospatial libraries available on the python package index, and almost all of them depend on GDAL. One such python library developed and supported by Mapbox, rasterio, builds on top of GDAL’s many features, but provides a more pythonic interface and supports many of the features and formats that GDAL supports. Both GDAL and rasterio are constantly being updated and improved.
Raster formats¶
GeoTiff is most common
GDAL is the foundation
Projections¶
Most often UTM
PROJ is the foundation
Raster transformations¶
Need a way to go from pixel coordinates (image on your screen) to real-world coordinates (projected)
Pixel coordinates: image width, height in units of pixels, starting at (0,0)
Real-world coordiantes: projected coordinate system (e.g., UTM 10N), units of meters
Origin is usually upper left corner of upper left pixel
Careful about this - you will definitely run into this problem at some point
Often your grid may be shifted by a half a pixel in x and y
Negative y cell size - what’s up with that?
GDAL/ESRI affine¶
rasterio affine¶
Multiply affine by raster indices to get projected coordinates
Rasterio dataset
xyandindexmethods
Basic raster structure¶
Dataset
Bands
Read band to get underlying 2D array data
3D array to create composites from multispectral bands¶
Can dstack 2D arrays
Misc¶
Be careful with large rasters
Try to avoid creating many copies of arrays
GDAL command line utilities¶
Learn these: https://gdal.org/programs/index.html
gdalinfo
gdal_translate
gdalwarp
gdaladdo
Items to discuss
Use standard creation options (co)
TILED=YES
COMPRESS=LZW
BIGTIFF=IF_SAFER
Resampling algorithms
Default is nearest
Often bilinear or bicubic is a better choice for reprojecting, upsampling, downsampling
Demo¶
https://automating-gis-processes.github.io/site/notebooks/Raster/reading-raster.html
import os
import rasterio as rio
import numpy as np
import matplotlib.pyplot as plt
#Create local directory to store images
imgdir = 'LS8_sample'
#Winter 2018
img_id1 = 'LC08_L1TP_046027_20181224_20190129_01_T1'
#Summer 2018
img_id2 = 'LC08_L1TP_046027_20180818_20180829_01_T1'
img = img_id2
sample_fn = os.path.join(imgdir, img+'_B1.TIF')
print(sample_fn)
LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B1.TIF
!gdalinfo $sample_fn
Driver: GTiff/GeoTIFF
Files: LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B1.TIF
Size is 7781, 7881
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 10N",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 10N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-123,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["unknown"],
AREA["World - N hemisphere - 126°W to 120°W - by country"],
BBOX[0,-126,84,-120]],
ID["EPSG",32610]]
Data axis to CRS axis mapping: 1,2
Origin = (473385.000000000000000,5373315.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Point
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 473385.000, 5373315.000) (123d21'37.35"W, 48d30'44.59"N)
Lower Left ( 473385.000, 5136885.000) (123d20'46.06"W, 46d23' 6.06"N)
Upper Right ( 706815.000, 5373315.000) (120d12' 4.97"W, 48d28'44.09"N)
Lower Right ( 706815.000, 5136885.000) (120d18'42.66"W, 46d21'14.16"N)
Center ( 590100.000, 5255100.000) (121d48'17.82"W, 47d26'35.21"N)
Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray
#Specify filenames for different bands we will need for the lab
#Check table from background section to see wavelengths of each band number
tir_fn = os.path.join(imgdir, img+'_B10.TIF')
print(tir_fn)
LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B10.TIF
with rio.open(tir_fn) as src:
print(src.profile)
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 7781, 'height': 7881, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(30.0, 0.0, 473385.0,
0.0, -30.0, 5373315.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'band'}
src = rio.open(tir_fn)
type(src)
rasterio.io.DatasetReader
src.profile
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 7781, 'height': 7881, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(30.0, 0.0, 473385.0,
0.0, -30.0, 5373315.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'band'}
src.read?
Docstring:
Read a dataset's raw pixels as an N-d array
This data is read from the dataset's band cache, which means
that repeated reads of the same windows may avoid I/O.
Parameters
----------
indexes : list of ints or a single int, optional
If `indexes` is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
out : numpy ndarray, optional
As with Numpy ufuncs, this is an optional reference to an
output array into which data will be placed. If the height
and width of `out` differ from that of the specified
window (see below), the raster image will be decimated or
replicated using the specified resampling method (also see
below).
*Note*: the method's return value may be a view on this
array. In other words, `out` is likely to be an
incomplete representation of the method's results.
This parameter cannot be combined with `out_shape`.
out_dtype : str or numpy dtype
The desired output data type. For example: 'uint8' or
rasterio.uint16.
out_shape : tuple, optional
A tuple describing the shape of a new output array. See
`out` (above) for notes on image decimation and
replication.
Cannot combined with `out`.
window : a pair (tuple) of pairs of ints or Window, optional
The optional `window` argument is a 2 item tuple. The first
item is a tuple containing the indexes of the rows at which
the window starts and stops and the second is a tuple
containing the indexes of the columns at which the window
starts and stops. For example, ((0, 2), (0, 2)) defines
a 2x2 window at the upper left of the raster dataset.
masked : bool, optional
If `masked` is `True` the return value will be a masked
array. Otherwise (the default) the return value will be a
regular array. Masks will be exactly the inverse of the
GDAL RFC 15 conforming arrays returned by read_masks().
boundless : bool, optional (default `False`)
If `True`, windows that extend beyond the dataset's extent
are permitted and partially or completely filled arrays will
be returned as appropriate.
resampling : Resampling
By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other
resampling algorithms may be specified. Resampled pixels
are not cached.
fill_value : scalar
Fill value applied in the `boundless=True` case only. Like
the fill_value of numpy.ma.MaskedArray, should be value
valid for the dataset's data type.
Returns
-------
Numpy ndarray or a view on a Numpy ndarray
Note: as with Numpy ufuncs, an object is returned even if you
use the optional `out` argument and the return value shall be
preferentially used by callers.
Type: builtin_function_or_method
%time
a = src.read(1)
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.48 µs
a
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
a.shape
(7881, 7781)
a.size
61322061
a.max()
65535
a
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
#%matplotlib widget
src.bounds
BoundingBox(left=473385.0, bottom=5136885.0, right=706815.0, top=5373315.0)
full_extent = [src.bounds.left, src.bounds.right, src.bounds.bottom, src.bounds.top]
full_extent
[473385.0, 706815.0, 5136885.0, 5373315.0]
f, ax = plt.subplots(figsize=(10,10))
ax.imshow(a, cmap='gray', extent=full_extent)
<matplotlib.image.AxesImage at 0x7f91289e1be0>