Lab02 Exercises (20 pts)#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

modified by Eric Gagliano

Introduction#

Objectives#

  1. Gain more experience with Python and iPython/Notebook functionality

  2. Explore basic Python operations with a known dataset

  3. Explore file input/output, string manipulation, loops and basic constructs in Python

Instructions#

  • Last week we did some basic manipulation of the words text file using bash shell

  • Let’s repeat some of this analysis using Python

  • For each question or task below, write some code in the empty cell and execute to preserve your output

  • Work together, consult resources we’ve discussed (e.g., Whirlwind Tour of Python, Python documentation, Stack Overflow), post to #lab02_python_jupyter Slack channel

Part 1: Another play on words (10 pts)#

Define a variable to store the path to the words file from last week’s repo#

  • Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

  • Note: Can use %pwd (print working directory, similar to pwd shell command) to get current directory path.

  • When defining paths in iPython, use /home/jovyan instead of ~ shortcut for your home directory

  • The path should be a string, enclosed in single quotes '/path/to/some/file.txt'

# Student exercise: your code goes here

Use Python to read this file and populate a list of strings containing all words#

  • Use basic Python open function here, even if you know how to do this with other modules

  • Note: you will need to handle newline strings '\n' at the end of each word

# Student exercise: your code goes here

How many words are there in the list?#

# Student exercise: your code goes here
235886

How many characters are in the first word of the list?#

# Student exercise: your code goes here

What is total number of characters for all words in the list?#

  • Can use list comprehension here to loop through all words

  • Note: the total character count here may be different than the total character count from wc -m in Lab01!

    • Here, you stripped the newline character \n from the end of each line, while those were included in the Lab01 count.

# Student exercise: your code goes here

How many characters are in the longest word?#

# Student exercise: your code goes here

What is the longest word?#

# Student exercise: your code goes here

Define a function that will concatenate an input list of strings#

  • Your function should return a single string (with no spaces)

  • This function should accept an input list with arbitrary length as an argument

    • So return inlist[0]+inlist[1]+inlist[2] won’t work

  • Example input: ['Geospatial', 'Data', 'Analysis']

  • Example output: 'GeospatialDataAnalysis'

# Student exercise: your code goes here

Run your function, passing in a list containing the first 3 words#

  • Use indexing here, don’t copy/paste strings from the list

# Student exercise: your code goes here
'Aaaa'

Run your function again:#

  • Passing in a list containing the first 5 words

  • Passing in a list containing the last 3 words

# Student exercise: your code goes here

Does your list contain the nickname for the UW mascot?#

  • This should be simple boolean statement

  • Careful about case!

# Student exercise: your code goes here

If so, what is the numerical index for that word?#

  • Double check by also printing the word at that index.

# Student exercise: your code goes here

Part 2: Letter counter (10 pts)#

How many words begin with each letter of the alphabet (case-insensitive)?#

One possible approach, use nested loops:

  • Loop through each letter

    • Initialize some count variable or empty list

    • Loop through each word in the list of words

      • Check to see if the word starts with the letter (careful about case!)

      • If it does, increment your counter or append the word to your list

    • Print out the letter and the total count of words that met your criterion

Or, use a dictionary!

  • Creat a new dictionary with a key for each lowercase letter.

  • Initialize a counter for each value in the dictionary.

  • Loop through words and increment the appropriate counter.

If you want, try to implement both - which one is faster?

# Student exercise: your code goes here
# Student exercise: your code goes here

What is the most common first letter?#

  • While it is possible to just look at the output counts above, try to do this with code.

  • If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops.

# Student exercise: your code goes here

Use string formatting to print your answer#

  • Output should be something like: “The most common first letter in words is ‘a’ with 17096 occurences”

  • Note that ‘a’ is not the correct answer - only 25 other possibilities to consider!

# Student exercise: your code goes here

Extra Credit: Create a plot of letter counts (+1 pt)#

  • We haven’t talked about matplotlib or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice.

# Student exercise: your code goes here

Extra Credit: Create a standalone Python script (+1 pt)#

  • Create a Python script to complete the task above, answering the question “How many words begin with each letter of the alphabet (case-insensitive)?”

  • Your script should be executable from the command line (remember to properly set permissions with chmod +x)

  • Your script should accept the path to the words file as an input argument

    • See Python sys.argv

  • Print the results to stdout (terminal)

    • Output should include a letter and total count on each line (“a 17096”)

  • Save the results to a text file called words_lettercount_Python.txt

    • This can be done by redirecting the output of the command used to run the script to a file

    • Alternatively, you can create and write to this file in Python

      • Can explore os.path to programatically append the '_lettercount_Python.txt' string to the original file path

Submission#

  • Save the completed notebook (make sure to fully run the notebook and check all cell output is visible)

  • Use the git add; git commit -m 'message'; git push workflow to push your work to the remote repository

    • ideally you’ve been using add / commit / push as you make progress on this notebook

  • Check the remote repository to check all of the files you want to submit have been pushed

  • When you have completed your last push, submit the url pointing to your Github repository to the corresponding Canvas assignment