Jupyter Notes
Contents
Jupyter Notes#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
These are a set of loosely organized notes, tips, tricks and gotchas for Jupyter resources (Jupyterhub, Jupyter lab and Jupyter notebooks) used during the course. Additional notes on initial setup and weekly workflow can be found in the GDA instructor resources.
Additional UW eScience Hackweek resources on initial Jupyterhub setup and navigation: https://snowex-hackweek.github.io/website/preliminary/jupyterhub.html
Jupyterhub#
Logging out#
Please try to remember to shut down your server and log out of the Jupyterhub when you’re done for the day. This will help save money (we’d like to avoid paying for cloud computing resources while you’re sleeping), and provides a clean start the next time you log in, which can help avoid various issues listed below.
Troubleshooting#
If you are experiencing strange behavior with the course Jupyterhub, please try the following:
1. Reload the tab in your browser#
Can resolve issues with notebook cells that appear strange or “cut off”
If you have unsaved changes, save the notebook, then reload
2. Log out and reauthenticate#
File -> Log out
Click “Sign in” button and reauthenticate with UW netid Canvas credentials
3. Restart Jupyterhub Server#
All students have the ability to stop and restart their own server on the Jupyterhub (don’t need to ask instructor or IT).
File –> Hub Control Panel –> Stop My Server (wait a ~30 seconds, then start again)
After image is updated (e.g., new packages added to default conda environment), all users will need to restart their server to see changes.
4. Review the list of common errors and solutions below#
5. Post a message to #it_help
channel on the class Slack workspace#
Important to post to public channel, as other students can chime in to help, and if this is a larger issue, confirm similar experiences
Provide detailed report - copy/paste error messages, include screenshots, etc. so we can help diagnose
Course instructor and IT help will provide assistance
Common Jupyterhub Errors#
Kernel Restarting
#
Likely ran out of memory.
Can “Restart Kernel and Run up to selected cell” to restore state
Remember, closing a notebook tab in JupyterLab interface doesn’t actually shut down the kernel! If you have opened/run many notebooks during a session, you may start to experience performance issues.
To remedy, click the “Running Terminals and Kernels” button (square inside a circle) on the left panel. It will show you all of the kernels that are running. Shut down any that you no longer need.
Dask Server Error
#
If you see this, it is likely that your server was shut down due to inactivity. Reload the page in your browser, and log back onto the Hub.
File Save Error for *.ipynb
or Failed to write *.ipynb
#
Temporary network interruption, dismiss and try manually saving
Check to make sure you haven’t filled the your allocated home directory storage (see below)
Spawn failed: Server at .... didn't respond in 30 seconds
#
This occurs when attempting to log into the hub, and the server fails to start (don’t panic)
Likely an issue with a full disk.
Reach out on
#it_help
Slack channel with a screenshot, and we will follow up with UW-IT (should be resolved quickly)See section on managing storage below to avoid this problem.
Managing data storage and disk space#
The default storage allocation for each student is 5 or 10 GB, which should be sufficient for the assignments. For project work, we can increase the storage allocation.
Please make sure you are not storing multiple copies of large data directories.
Always use relative paths to load data from existing data directories elsewhere on the filesystem (like the
LS8_sample
directory), rather than copying them to your assignment repo.In the real world, you (or your employer) have to pay for every GB per month, so good to learn best practices now.
You can also safely delete large data directories from previous demo/labs, as you can rerun the data download notebooks to download in the future.
Checking current storage and available disk space#
Open a Terminal on the hub and run:
du -s -h ~/*
- shows you total data volume in each directory, cancd
and rerun to identifiy large subdirectories/filesdf -h ~
- shows total available storage and amount/percent used
Freeing up storage#
If your
df
percentage is >80%, please do the following:Delete large data files/directories that are no longer necessary or can be easily downloaded again in the future.
Reach out on the
#it_help
Slack channel or send a direct message, and we can request more storage. Please don’t hesitate to do this if you will need more temporary storage for projects.
Initial Instructor Setup#
Work with UW-IT template#
See sample for 2020 here: https://github.com/UW-GDA/uwgda-image
Most of the user-level customization is found in the binder
subdirectory.
We used a dev
branch to submit Pull Requests with modifications (e.g., adding a new package) and then merged into master
after tests completed.
Add any core utilities to apt.txt
#
This includes common command line tools like wget
or text editors like vim
. Sample:
https://github.com/UW-GDA/uwgda-image/blob/master/binder/apt.txt
Add any custom Jupyter lab extensions#
Add useful interactive visualization tools (pyviz, leaflet), Dask integration, etc.
Sample: https://github.com/UW-GDA/uwgda-image/blob/master/binder/postBuild
See available extensions: https://github.com/topics/jupyterlab-extension
Jupyter notebooks#
Use keyboard shortcuts#
Best practices for memory management#
Avoid duplicating large arrays ( as in “dynamically process original and plot in one command” instead of “process original, store as new array, then plot new array”)
Avoid unnecessarily increasing bit depth (e.g., loading a
UInt16
array asfloat64
quadruples memory footprint)Release memory from arrays that are no longer needed with
myarray = None
(see https://stackoverflow.com/a/35316944)Use Dask!
Disable Jupyter Notebook Warnings#
import warnings
warnings.filterwarnings('ignore')