Github and Github Classroom Notes#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

These are a set of loosely organized notes, tips, tricks and gotchas for git and Github resources used during the course. Additional notes on initial setup and weekly workflow for students and instructors can be found in the Resources.

There are many good resources on git and Github on the web. See the 01_Shell_Github reading assignment and demo.

Additional UW eScience Hackweek resources on initial Github setup and navigation:

First time login#

Replace the following with your name and the email you used to create your Github account
git config --global user.name "Matt Damon"
git config --global user.email "email@example.com"

Authentication#

As of August 2021, Github disabled using passwords for remote command line access. We will use the more secure Personal Access Token (PAT) authentication option.

Create Personal Access Token (PAT)#

Store credentials#

So you don’t have to enter github username and PAT each time you pull or push to a remote repo

Permanently#

  1. Run this once:
    git config --global credential.helper store

  2. You may receive a warning about loose permissions the first time you git pull. To prevent this:
    chmod 0700 /home/jovyan/.cache/git/credential

  3. Enter credentials one additional time:
    git pull (Should prompt for username)

  4. [Enter username and PAT]

  5. The next time you run a git command requiring remote origin, no username/password required!
    git pull (Should say “Already up to date.”)

Store credentials for 15 minutes (900 seconds) without reauthenticating#

git config --global credential.helper 'cache --timeout=900'

Two-factor authentication (2FA)#

In past years, enabling 2FA led to issues with authentication using the terminal on the course Jupyterhub. This may be resolved with updated PAT authentication requirements.

  • Should be disabled by default (for new accounts)

  • If you’re using an existing Github account and previously enabled two-factor authentication, you may need to disable

Git workflows#

Clone remote repository#

  1. Open repository webpage

  2. Click big green “Code” button

  3. Select HTTPS and copy link

  4. On course Jupyterhub, open a terminal

  5. Navigate to the directory containing assignments (cd labs)

git clone [paste https link]
cd [repo name]

Basic with remote#

git pull
git add myfile.py
git commit -m 'Added myfile.py'
git push

Basic branching (local)#

https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging

git checkout -b newbranch
git add updated_file.py
git commit -m 'Fixed typo in updated_file.py'
git checkout main
git merge newbranch
git checkout -d newbranch

Jupyterbook#

There is a rendered version of the Jupyterbook on the course website: https://uwgda-jupyterbook.readthedocs.io. These html pages are automatically generated from the content (mostly markdown files and Jupyter notebooks) in a Github repository: https://github.com/UW-GDA/jupyterbook. The notebooks can be explored interactively, but you will need a local copy on the hub.

On the course Jupyterhub, open a new Terminal and run the following from your home directory: cd ~; git clone https://github.com/UW-GDA/jupyterbook.git

This will create a new directory /home/jovyan/jupyterbook and download all of the latest course material. From the File Browser, you can navigate to jupyterbook/book/modules/ and then open the notebooks for the lecture/demo to follow along interactively running the same commands.

Note that there is also a copy of the exercises notebooks in each of these module subdirectories. Please don’t use them! Work on the exercises notebooks from the Github Classroom assignment repository that you cloned.

Pulling latest version of the Jupyterbook#

The content of the book will be updated throughout the quarter, including the interactive demo notebooks. To pull the latest versions to your cloned repo on the Jupyterhub:

  1. cd /home/jovyan/jupyterbook

  2. git status

    • If you made local changes to the demos in previous weeks (i.e. executing cells), you will see some red text with modified: followed by a filename. Unfortunately, your notebook is now different than the main version in the Github repo, so you can’t just do a simple git pull.

  3. There are many ways to handle this:

    • The simplest (recommended) option is to discard all local changes and do a “hard reset” to the latest version of the tracked files on Github: git reset --hard; git pull

      • You will lose any modifications you’ve made to the notebooks, but this will preserve other “untracked” files in the local directory (i.e., data files)

      • Your local version of all notebooks will be identical to the latest versions on Github

    • If you would like to preserve local changes (maybe notes you took during a previous demo, or some additional experimentation in the demo notebook): git pull --autostash

    • Other solutions involving branching to preserve local changes: https://stackoverflow.com/questions/1125968/how-do-i-force-git-pull-to-overwrite-local-files

FAQ, Notes#

Improved git log formatting#

git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit
Note: Can add as alias to ~/.gitconfig file

Should I git clone via https or ssh?#

  • Default is https, requires authentication with Github username and password

  • Can also set up ssh keys on Jupyterhub, if that doesn’t sound intimidating

Why are a bunch of random files added to my repo?#

Should I store data in the git repo?#

  • A few small files or test data are great, even better if they are text data or some other non-binary format

  • Large data files do not belong in the repo, store them externally and fetch dynamically

    • Zenodo, UW Library, Amazon S3 or Google Cloud Storage bucket, some other public data archive

  • Do you need to track changes to the data files?

  • https://docs.github.com/en/repositories/working-with-files/managing-large-files

Issues with large notebooks (>5-10 MB) rendering on Github#

May no longer be relevant Github can fail to render notebooks. Sometimes reloading works.

If notebook is in a public Github repo, go to https://nbviewer.jupyter.org/ and paste the url to the notebook.

Github Organization Notes#

To make your repo visible to students/instructors in the organization#

Settings -> Manage Access - > Invite Teams (Green Button) -> gda_w2020_students, and grant read access to desired teams

To make your project repo public for the world#

Settings -> Options (left menu) -> Danger Zone -> Make public

To check public visibility, you can always sign out of Github and navigate the GDA org https://github.com/UW-GDA/ to see the public repos.

Github Classroom Notes#

See the instructor resources on the weekly Github Classroom workflow for assignment distribution.

Issues accepting assignments#

In both 2019 and 2020, during one week, the students went to accept their assignment, and the progress bar froze. Worked for a few students, then majority of class can’t get the assignment. Panic ensues. Can make starter code visible to the student team, then show them how to Fork In dire circumstances, just post the notebook on Slack channel A good teaching moment. The issue resolves itself within ~24 hours, students can use their fork or clone the assignment repo, then copy working notebook