03: Core Python Modules - Numpy, Pandas and Matplotlib#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

Please quickly read through this entire document once, then go back and start tackling the various tasks.

Overview#

This week is our final “flip the classroom” situation with longer interactive reading assignments. Before class, you will be responsible for reviewing material from external resources. Consider this your homework (in addition to completing the lab exercises from last week). During lecture/lab, we will briefly review some of this material, do some interactive demos, discuss questions and clarify concepts as a class, and then collaboratively work on some problems/exercises to help solidify the concepts (which will inevitably lead to more questions and discussion). I think this is the best use of our limited time together.

Reading and Tutorials#

This week we are reviewing core Python modules: NumPy, Pandas, and Matplotlib. These are essential for the rest of the course and future data science endeavors (geospatial and otherwise). Most of the geospatial modules we use are built on these packages, so you must be comfortable with the underlying functionality. This review will ensure that we have a common baseline and set of references moving forward. If you are relatively new to Python and these modules, it is critical that you spend extra time with self-study this week.

Again, this is intended to be an individual review. Tailor to your needs, and adjust emphasis so you are best using your time outside of class. Even if you’ve been using these tools for many years, it can still valuable to review, as you will inevitably learn (or re-learn) some new tricks and develop a better grasp of more complex concepts.

As with the previous homework assignments, don’t wait until Friday morning to start. This material will be much more useful if broken up over several sessions throughout the week - try an hour or two a day. Again, if this is new, please dedicate the time to explore interactively, don’t just skim rendered versions on the web.

Python Data Science Handbook: NumPy, Pandas and Matplotlib (~2-6 hours)#

  • Review the following sections of Jake Vanderplas’ “Python Data Science Handbook” https://jakevdp.github.io/PythonDataScienceHandbook/. You should have previously cloned on the Jupyterhub in a Week 02 directory (see Week 02 instructions). Get through what you can, but if you’re feeling overwhelmed or pressed for time, try to at least work through the first half of each section, which should cover most basic functionality.

    • 2. Introduction to NumPy

      • Can skip section on Structured Arrays (we’ll use Pandas)

    • 3. Data Manipulation with Pandas

      • If new to Pandas, can skip section on Hierarchical Indexing, Pivot Tables and High-performance Pandas

    • 4. Visualization with Matplotlib

      • Skip the section on “Geographic Data with Basemap” as this is largely outdated

      • Can skip Customizing Matplotlib: Configurations and Stylesheets

  • Work through some of the interactive examples, and explore new concepts (don’t just shift-Enter as quickly as possible)

  • The section on Machine Learning with Scikit-Learn is also great, but not required for this course

(Optional) Official Quickstart/Doc#

Skip any installation instructions

Pandas#

Matplotlib#

NumPy#

Assignment (due Friday)#

  • Complete above reading/tutorials

  • Fill out the following feedback form: https://forms.gle/H9DZHgs3cgpT3ZW9A

  • Finish and turn in the lab exercises from last week (due midnight on Friday)