Demo: Fundamentals, GDAL, rasterio Discussion

Introduction

What is a raster?

Raster data sources

  • Satellite imagery

  • Gridded model output

  • Interpolated vector data

Raster fundamentals (interactive discussion)

Dimensions (width [columns] and height [rows] in pixels)

CRS (coordinate system)

Extent (bounds)

Resolution (pixel size)

Data type (bit depth)

Number of bands

NoData values

  • NumPy masked arrays vs. float/np.nan

Overviews

GDAL (Geospatial Data Abstraction Library) and Rasterio

GDAL is a powerful and mature library for reading, writing and warping raster datasets, written in C++ with bindings to other languages. There are a variety of geospatial libraries available on the python package index, and almost all of them depend on GDAL. One such python library developed and supported by Mapbox, rasterio, builds on top of GDAL’s many features, but provides a more pythonic interface and supports many of the features and formats that GDAL supports. Both GDAL and rasterio are constantly being updated and improved.

Raster formats

  • GeoTiff is most common

  • GDAL is the foundation

Projections

  • Most often UTM

  • PROJ is the foundation

Raster transformations

  • Need a way to go from pixel coordinates (image on your screen) to real-world coordinates (projected)

    • Pixel coordinates: image width, height in units of pixels, starting at (0,0)

    • Real-world coordiantes: projected coordinate system (e.g., UTM 10N), units of meters

  • Origin is usually upper left corner of upper left pixel

    • Careful about this - you will definitely run into this problem at some point

    • Often your grid may be shifted by a half a pixel in x and y

  • Negative y cell size - what’s up with that?

GDAL/ESRI affine

rasterio affine

  • Multiply affine by raster indices to get projected coordinates

  • Rasterio dataset xy and index methods

Basic raster structure

  • Dataset

  • Bands

  • Read band to get underlying 2D array data

3D array to create composites from multispectral bands

  • Can dstack 2D arrays

Misc

  • Be careful with large rasters

  • Try to avoid creating many copies of arrays

GDAL command line utilities

  • Learn these: https://gdal.org/programs/index.html

    • gdalinfo

    • gdal_translate

    • gdalwarp

    • gdaladdo

  • Items to discuss

    • Use standard creation options (co)

      • TILED=YES

      • COMPRESS=LZW

      • BIGTIFF=IF_SAFER

    • Resampling algorithms

      • Default is nearest

      • Often bilinear or bicubic is a better choice for reprojecting, upsampling, downsampling

https://medium.com/planet-stories/a-gentle-introduction-to-gdal-part-4-working-with-satellite-data-d3835b5e2971

Demo

https://automating-gis-processes.github.io/site/notebooks/Raster/reading-raster.html

import os
import rasterio as rio
import numpy as np
import matplotlib.pyplot as plt
#Create local directory to store images
imgdir = 'LS8_sample'
#Winter 2018
img_id1 = 'LC08_L1TP_046027_20181224_20190129_01_T1'
#Summer 2018
img_id2 = 'LC08_L1TP_046027_20180818_20180829_01_T1'
img = img_id2
sample_fn = os.path.join(imgdir, img+'_B1.TIF')
print(sample_fn)
LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B1.TIF
!gdalinfo $sample_fn
Driver: GTiff/GeoTIFF
Files: LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B1.TIF
Size is 7781, 7881
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 10N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 10N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-123,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["World - N hemisphere - 126°W to 120°W - by country"],
        BBOX[0,-126,84,-120]],
    ID["EPSG",32610]]
Data axis to CRS axis mapping: 1,2
Origin = (473385.000000000000000,5373315.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  AREA_OR_POINT=Point
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  473385.000, 5373315.000) (123d21'37.35"W, 48d30'44.59"N)
Lower Left  (  473385.000, 5136885.000) (123d20'46.06"W, 46d23' 6.06"N)
Upper Right (  706815.000, 5373315.000) (120d12' 4.97"W, 48d28'44.09"N)
Lower Right (  706815.000, 5136885.000) (120d18'42.66"W, 46d21'14.16"N)
Center      (  590100.000, 5255100.000) (121d48'17.82"W, 47d26'35.21"N)
Band 1 Block=256x256 Type=UInt16, ColorInterp=Gray
#Specify filenames for different bands we will need for the lab
#Check table from background section to see wavelengths of each band number
tir_fn = os.path.join(imgdir, img+'_B10.TIF')
print(tir_fn)
LS8_sample/LC08_L1TP_046027_20180818_20180829_01_T1_B10.TIF
with rio.open(tir_fn) as src:
    print(src.profile)
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 7781, 'height': 7881, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(30.0, 0.0, 473385.0,
       0.0, -30.0, 5373315.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'band'}
src = rio.open(tir_fn)
type(src)
rasterio.io.DatasetReader
src.profile
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 7781, 'height': 7881, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(30.0, 0.0, 473385.0,
       0.0, -30.0, 5373315.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'band'}
src.read?
Docstring:
Read a dataset's raw pixels as an N-d array

This data is read from the dataset's band cache, which means
that repeated reads of the same windows may avoid I/O.

Parameters
----------
indexes : list of ints or a single int, optional
    If `indexes` is a list, the result is a 3D array, but is
    a 2D array if it is a band index number.

out : numpy ndarray, optional
    As with Numpy ufuncs, this is an optional reference to an
    output array into which data will be placed. If the height
    and width of `out` differ from that of the specified
    window (see below), the raster image will be decimated or
    replicated using the specified resampling method (also see
    below).

    *Note*: the method's return value may be a view on this
    array. In other words, `out` is likely to be an
    incomplete representation of the method's results.

    This parameter cannot be combined with `out_shape`.

out_dtype : str or numpy dtype
    The desired output data type. For example: 'uint8' or
    rasterio.uint16.

out_shape : tuple, optional
    A tuple describing the shape of a new output array. See
    `out` (above) for notes on image decimation and
    replication.

    Cannot combined with `out`.

window : a pair (tuple) of pairs of ints or Window, optional
    The optional `window` argument is a 2 item tuple. The first
    item is a tuple containing the indexes of the rows at which
    the window starts and stops and the second is a tuple
    containing the indexes of the columns at which the window
    starts and stops. For example, ((0, 2), (0, 2)) defines
    a 2x2 window at the upper left of the raster dataset.

masked : bool, optional
    If `masked` is `True` the return value will be a masked
    array. Otherwise (the default) the return value will be a
    regular array. Masks will be exactly the inverse of the
    GDAL RFC 15 conforming arrays returned by read_masks().

boundless : bool, optional (default `False`)
    If `True`, windows that extend beyond the dataset's extent
    are permitted and partially or completely filled arrays will
    be returned as appropriate.

resampling : Resampling
    By default, pixel values are read raw or interpolated using
    a nearest neighbor algorithm from the band cache. Other
    resampling algorithms may be specified. Resampled pixels
    are not cached.

fill_value : scalar
    Fill value applied in the `boundless=True` case only. Like
    the fill_value of numpy.ma.MaskedArray, should be value
    valid for the dataset's data type.

Returns
-------
Numpy ndarray or a view on a Numpy ndarray

Note: as with Numpy ufuncs, an object is returned even if you
use the optional `out` argument and the return value shall be
preferentially used by callers.
Type:      builtin_function_or_method
%time
a = src.read(1)
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.48 µs
a
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
a.shape
(7881, 7781)
a.size
61322061
a.max()
65535
a
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
#%matplotlib widget
src.bounds
BoundingBox(left=473385.0, bottom=5136885.0, right=706815.0, top=5373315.0)
full_extent = [src.bounds.left, src.bounds.right, src.bounds.bottom, src.bounds.top]
full_extent
[473385.0, 706815.0, 5136885.0, 5373315.0]
f, ax = plt.subplots(figsize=(10,10))
ax.imshow(a, cmap='gray', extent=full_extent)
<matplotlib.image.AxesImage at 0x7f91289e1be0>
../../_images/05_Raster1_GDAL_rasterio_Demo_33_1.png

Image bit depth

Number of possible intensity values

2**12
4096
2**8
256
2**16
65536