Exercises

UW Geospatial Data Analysis
CEE498/CEWA599
David Shean

Objectives

  1. Gain more experience with Python and iPython/Notebook functionality

  2. Explore basic Python operations with a known dataset

  3. Explore file input/output, string manipulation, loops and basic constructs in Python

Exercises: Another play on words

🙄 (:face_with_rolling_eyes:)

Instructions

  • Last week we did some basic manipulation of the words text file using bash shell.

  • Let’s repeat some of this analysis using Python.

  • For each question or task below, write some code in the empty cell and execute to preserve your output

  • Work together, consult resources we’ve discussed (e.g., Whirlwind Tour of Python, Python documentation, Stack Overflow), post to #lab02_python_jupyter Slack channel

  • Save the completed notebook, and use the basic git add; git commit -m 'message'; git push workflow to upload by next Friday

  • Submit the url pointing to your Github assignment repo on Canvas assignment

Here we go!

Define a variable to store the path to the words file from last week’s repo

  • Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

  • Note: Can use %pwd (print working directory, similar to pwd shell command) to get current directory path.

  • When defining paths in iPython, use /home/jovyan instead of ~ shortcut for your home directory

  • The path should be a string, enclosed in single quotes '/path/to/some/file.txt'

Use Python to read this file and populate a list of strings containing all words

  • Use basic Python open function here, even if you know how to do this with other modules

  • Note: you will need to handle newline strings '\n' at the end of each word

How many words are there in the list?

How many characters are in the first word of the list?

What is total number of characters for all words in the list?

  • Can use list comprehension here to loop through all words

How many characters are in the longest word?

Define a function that will concatenate an input list of strings

  • Your function should return a single string (with no spaces)

  • This function should accept an input list with arbitrary length as an argument

    • So return inlist[0]+inlist[1]+inlist[2] won’t work

Example input: ['Geospatial', 'Data', 'Analysis']
Example output: 'GeospatialDataAnalysis'

Run your function, passing in a list containing the first 3 words

Use indexing here, don’t copy/paste strings from the list

Run your function, passing in a list containing the first 5 words

Run your function, passing in a list containing the last 3 words

Does your list contain the nickname for the UW mascot?

  • This should be simple boolean statement

  • Careful about case!

If so, what is the numerical index for that word?

  • Do a sanity check, and print the word at that index.

How many words begin with each letter of the alphabet (case-insensitive)?

One possible approach, use nested loops:

  • Loop through each letter

    • Initialize some count variable or empty list

    • Loop through each word in the list of words

      • Check to see if the word starts with the letter (careful about case!)

      • If it does, increment your counter or append the word to your list

    • Print out the letter and the total count of words that met your criterion

Or, use a dictionary!

  • Creat a new dictionary with a key for each lowercase letter.

  • Initialize a counter for each value in the dictionary.

  • Loop through words and increment the appropriate counter.

If you want, try to implement both - which one is faster?

What is the most common first letter?

  • While it is possible to just look at the output counts above, try to do this with code.

  • If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops.

Use string formatting to print your answer

  • Output should be something like: “The most common first letter in words is ‘a’ with 17096 occurences”

  • Note that ‘a’ is not the correct answer - only 25 other possibilities to consider!

Extra Credit: Create a plot of letter counts

  • We haven’t talked about matplotlib or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice.

Extra Credit: Create a standalone Python script

  • Create a Python script to complete the task above, answering the question “How many words begin with each letter of the alphabet (case-insensitive)?”

  • Your script should be executable from the command line (remember to properly permissions with chmod +x)

  • Your script should accept the path to the words file as an input argument

    • See Python sys.argv

  • Print the results to stdout (terminal)

    • Output should include a letter and total count on each line (“a 17096”)

  • Save the results to a text file called words_lettercount_Python.txt

    • This can be done by redirecting the output of the command used to run the script to a file

    • Alternatively, you can create and write to this file in Python

      • Can explore os.path to programatically append the '_lettercount_Python.txt' string to the original file path