Lab02 Exercises (20 pts)
Contents
Lab02 Exercises (20 pts)#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
modified by Eric Gagliano
Introduction#
Objectives#
Gain more experience with Python and iPython/Notebook functionality
Explore basic Python operations with a known dataset
Explore file input/output, string manipulation, loops and basic constructs in Python
Instructions#
Last week we did some basic manipulation of the
words
text file usingbash
shellLet’s repeat some of this analysis using Python
For each question or task below, write some code in the empty cell and execute to preserve your output
Work together, consult resources we’ve discussed (e.g., Whirlwind Tour of Python, Python documentation, Stack Overflow), post to
#lab02_python_jupyter
Slack channel
Part 1: Another play on words
(10 pts)#
Define a variable to store the path to the words
file from last week’s repo#
Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/
Note: Can use
%pwd
(print working directory, similar topwd
shell command) to get current directory path.When defining paths in iPython, use
/home/jovyan
instead of~
shortcut for your home directoryThe path should be a string, enclosed in single quotes
'/path/to/some/file.txt'
# Student exercise: your code goes here
Use Python to read this file and populate a list of strings containing all words#
Use basic Python
open
function here, even if you know how to do this with other modulesNote: you will need to handle newline strings
'\n'
at the end of each word
# Student exercise: your code goes here
How many words are there in the list?#
# Student exercise: your code goes here
235886
How many characters are in the first word of the list?#
# Student exercise: your code goes here
What is total number of characters for all words in the list?#
Can use list comprehension here to loop through all words
Note: the total character count here may be different than the total character count from
wc -m
in Lab01!Here, you stripped the newline character
\n
from the end of each line, while those were included in the Lab01 count.
# Student exercise: your code goes here
How many characters are in the longest word?#
# Student exercise: your code goes here
What is the longest word?#
# Student exercise: your code goes here
Print the first 3 words, print the last 3 words#
Use relative list indices for slicing: https://stackoverflow.com/questions/509211/understanding-slice-notation
Note that the output is still a list object
# Student exercise: your code goes here
Define a function that will concatenate an input list of strings#
Your function should return a single string (with no spaces)
This function should accept an input list with arbitrary length as an argument
So
return inlist[0]+inlist[1]+inlist[2]
won’t work
Example input:
['Geospatial', 'Data', 'Analysis']
Example output:
'GeospatialDataAnalysis'
# Student exercise: your code goes here
Run your function, passing in a list containing the first 3 words#
Use indexing here, don’t copy/paste strings from the list
# Student exercise: your code goes here
'Aaaa'
Run your function again:#
Passing in a list containing the first 5 words
Passing in a list containing the last 3 words
# Student exercise: your code goes here
Does your list contain the nickname for the UW mascot?#
This should be simple boolean statement
Careful about case!
# Student exercise: your code goes here
If so, what is the numerical index for that word?#
Double check by also printing the word at that index.
# Student exercise: your code goes here
Part 2: Letter counter (10 pts)#
How many words begin with each letter of the alphabet (case-insensitive)?#
Hint: Python has built-in list of lowercase letters stored as
string.ascii_lowercase
(in thestring
module, so need to import first!). Also, all string objects have methods that can change the case: https://docs.python.org/2.5/lib/string-methods.html
One possible approach, use nested loops:
Loop through each letter
Initialize some count variable or empty list
Loop through each word in the list of words
Check to see if the word starts with the letter (careful about case!)
If it does, increment your counter or append the word to your list
Print out the letter and the total count of words that met your criterion
Or, use a dictionary!
Creat a new dictionary with a key for each lowercase letter.
Initialize a counter for each value in the dictionary.
Loop through words and increment the appropriate counter.
If you want, try to implement both - which one is faster?
# Student exercise: your code goes here
# Student exercise: your code goes here
What is the most common first letter?#
While it is possible to just look at the output counts above, try to do this with code.
If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops.
# Student exercise: your code goes here
Use string formatting to print your answer#
Output should be something like: “The most common first letter in words is ‘a’ with 17096 occurences”
Note that ‘a’ is not the correct answer - only 25 other possibilities to consider!
# Student exercise: your code goes here
Extra Credit: Create a plot of letter counts (+1 pt)#
We haven’t talked about
matplotlib
or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice.
# Student exercise: your code goes here
Extra Credit: Create a standalone Python script (+1 pt)#
Create a Python script to complete the task above, answering the question “How many words begin with each letter of the alphabet (case-insensitive)?”
Your script should be executable from the command line (remember to properly set permissions with
chmod +x
)Your script should accept the path to the
words
file as an input argumentSee Python
sys.argv
Print the results to stdout (terminal)
Output should include a letter and total count on each line (“a 17096”)
Save the results to a text file called
words_lettercount_Python.txt
This can be done by redirecting the output of the command used to run the script to a file
Alternatively, you can create and write to this file in Python
Can explore
os.path
to programatically append the'_lettercount_Python.txt'
string to the original file path
Submission#
Save the completed notebook (make sure to fully run the notebook and check all cell output is visible)
Use the
git add; git commit -m 'message'; git push
workflow to push your work to the remote repositoryideally you’ve been using add / commit / push as you make progress on this notebook
Check the remote repository to check all of the files you want to submit have been pushed
When you have completed your last push, submit the url pointing to your Github repository to the corresponding Canvas assignment