Lab02 Exercises#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

Objectives#

  1. Gain more experience with Python and iPython/Notebook functionality

  2. Explore basic Python operations with a known dataset

  3. Explore file input/output, string manipulation, loops and basic constructs in Python

Instructions#

  • Last week we did some basic manipulation of the words text file using bash shell.

  • Let’s repeat some of this analysis using Python.

  • For each question or task below, write some code in the empty cell and execute to preserve your output

  • Work together, consult resources we’ve discussed (e.g., Whirlwind Tour of Python, Python documentation, Stack Overflow), post to #lab02_python_jupyter Slack channel

  • Save the completed notebook, and use the basic git add; git commit -m 'message'; git push workflow to upload by next Friday

  • Submit the url pointing to your Github assignment repo on Canvas assignment

Here we go!

Part 1: Another play on words#

🙄 (:face_with_rolling_eyes:)

Define a variable to store the path to the words file from last week’s repo#

  • Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

  • Note: Can use %pwd (print working directory, similar to pwd shell command) to get current directory path.

  • When defining paths in iPython, use /home/jovyan instead of ~ shortcut for your home directory

  • The path should be a string, enclosed in single quotes '/path/to/some/file.txt'

#Student Exercise

Use Python to read this file and populate a list of strings containing all words#

  • Use basic Python open function here, even if you know how to do this with other modules

  • Note: you will need to handle newline strings '\n' at the end of each word

#Student Exercise

How many words are there in the list?#

#Student Exercise
235886

How many characters are in the first word of the list?#

#Student Exercise

What is total number of characters for all words in the list?#

  • Can use list comprehension here to loop through all words

  • Note: the total character count here may be different than the total character count from wc -m in Lab01!

    • Here, you stripped the newline character \n from the end of each line, while those were included in the Lab01 count.

#Student Exercise

How many characters are in the longest word?#

#Student Exercise

Extra credit: What is the longest word?#

#Student Exercise

Define a function that will concatenate an input list of strings#

  • Your function should return a single string (with no spaces)

  • This function should accept an input list with arbitrary length as an argument

    • So return inlist[0]+inlist[1]+inlist[2] won’t work

  • Example input: ['Geospatial', 'Data', 'Analysis']

  • Example output: 'GeospatialDataAnalysis'

#Student Exercise

Run your function, passing in a list containing the first 3 words#

  • Use indexing here, don’t copy/paste strings from the list

#Student Exercise
'Aaaa'

Run your function again:#

  • Passing in a list containing the first 5 words

  • Passing in a list containing the last 3 words

#Student Exercise

Does your list contain the nickname for the UW mascot?#

  • This should be simple boolean statement

  • Careful about case!

#Student Exercise

If so, what is the numerical index for that word?#

  • Do a sanity check, and print the word at that index.

#Student Exercise

Part 2: Letter counter#

How many words begin with each letter of the alphabet (case-insensitive)?#

One possible approach, use nested loops:

  • Loop through each letter

    • Initialize some count variable or empty list

    • Loop through each word in the list of words

      • Check to see if the word starts with the letter (careful about case!)

      • If it does, increment your counter or append the word to your list

    • Print out the letter and the total count of words that met your criterion

Or, use a dictionary!

  • Creat a new dictionary with a key for each lowercase letter.

  • Initialize a counter for each value in the dictionary.

  • Loop through words and increment the appropriate counter.

If you want, try to implement both - which one is faster?

#Student Exercise
#Student Exercise

What is the most common first letter?#

  • While it is possible to just look at the output counts above, try to do this with code.

  • If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops.

#Student Exercise

Use string formatting to print your answer#

  • Output should be something like: “The most common first letter in words is ‘a’ with 17096 occurences”

  • Note that ‘a’ is not the correct answer - only 25 other possibilities to consider!

#Student Exercise

Extra Credit: Create a plot of letter counts#

  • We haven’t talked about matplotlib or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice.

#Student Exercise

Extra Credit: Create a standalone Python script#

  • Create a Python script to complete the task above, answering the question “How many words begin with each letter of the alphabet (case-insensitive)?”

  • Your script should be executable from the command line (remember to properly set permissions with chmod +x)

  • Your script should accept the path to the words file as an input argument

    • See Python sys.argv

  • Print the results to stdout (terminal)

    • Output should include a letter and total count on each line (“a 17096”)

  • Save the results to a text file called words_lettercount_Python.txt

    • This can be done by redirecting the output of the command used to run the script to a file

    • Alternatively, you can create and write to this file in Python

      • Can explore os.path to programatically append the '_lettercount_Python.txt' string to the original file path