{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises\n",
    "UW Geospatial Data Analysis  \n",
    "CEE498/CEWA599  \n",
    "David Shean\n",
    "\n",
    "## Objectives\n",
    "1. Gain more experience with Python and iPython/Notebook functionality\n",
    "2. Explore basic Python operations with a known dataset\n",
    "3. Explore file input/output, string manipulation, loops and basic constructs in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises: Another play on `words`\n",
    "🙄 (:face_with_rolling_eyes:)\n",
    "\n",
    "## Instructions\n",
    "- Last week we did some basic manipulation of the `words` text file using `bash` shell.  \n",
    "- Let's repeat some of this analysis using Python.  \n",
    "- For each question or task below, write some code in the empty cell and execute to preserve your output \n",
    "- Work together, consult resources we've discussed (e.g., Whirlwind Tour of Python, Python documentation, Stack Overflow), post to `#lab02_python_jupyter` Slack channel\n",
    "- Save the completed notebook, and use the basic `git add; git commit -m 'message'; git push` workflow to upload by next Friday\n",
    "- Submit the url pointing to your Github assignment repo on Canvas assignment\n",
    "\n",
    "Here we go!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a variable to store the path to the `words` file from last week's repo\n",
    "* Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/\n",
    "* Note: Can use `%pwd` (print working directory, similar to `pwd` shell command) to get current directory path.\n",
    "* When defining paths in iPython, use `/home/jovyan` instead of `~` shortcut for your home directory\n",
    "* The path should be a string, enclosed in single quotes `'/path/to/some/file.txt'`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Python to read this file and populate a list of strings containing all words\n",
    "* Use basic Python `open` function here, even if you know how to do this with other modules\n",
    "* Note: you will need to handle newline strings `'\\n'` at the end of each word"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many words are there in the list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many characters are in the first word of the list? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is total number of characters for all words in the list?\n",
    "* Can use list comprehension here to loop through all words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many characters are in the longest word?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Print the first 3 words, print the last 3 words\n",
    "* Use relative list indices for slicing: https://stackoverflow.com/questions/509211/understanding-slice-notation\n",
    "* Note that the output is still a list object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a function that will concatenate an input list of strings\n",
    "* Your function should return a single string (with no spaces)\n",
    "* This function should accept an input list with arbitrary length as an argument\n",
    "    * So `return inlist[0]+inlist[1]+inlist[2]` won't work\n",
    "\n",
    "*Example input:* `['Geospatial', 'Data', 'Analysis']`  \n",
    "*Example output:* `'GeospatialDataAnalysis'`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run your function, passing in a list containing the first 3 words\n",
    "*Use indexing here, don't copy/paste strings from the list*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run your function, passing in a list containing the first 5 words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run your function, passing in a list containing the last 3 words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Does your list contain the nickname for the UW mascot?\n",
    "* This should be simple boolean statement\n",
    "* Careful about case!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## If so, what is the numerical index for that word?\n",
    "* Do a sanity check, and print the word at that index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many words begin with each letter of the alphabet (case-insensitive)?\n",
    "* Hint: Python has built-in list of lowercase letters stored as `string.ascii_lowercase` (in the `string` module, so need to import first!).  Also, all string objects have methods that can change the case: https://docs.python.org/2.5/lib/string-methods.html\n",
    "\n",
    "One possible approach, use nested loops:\n",
    "* Loop through each letter\n",
    "    * Initialize some count variable or empty list\n",
    "    * Loop through each word in the list of words\n",
    "        * Check to see if the word starts with the letter (careful about case!)\n",
    "        * If it does, increment your counter or append the word to your list\n",
    "    * Print out the letter and the total count of words that met your criterion       \n",
    "\n",
    "Or, use a dictionary!\n",
    "* Creat a new dictionary with a key for each lowercase letter.\n",
    "* Initialize a counter for each value in the dictionary.\n",
    "* Loop through words and increment the appropriate counter.\n",
    "\n",
    "If you want, try to implement both - which one is faster?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is the most common first letter?\n",
    "* While it is possible to just look at the output counts above, try to do this with code.\n",
    "* If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use string formatting to print your answer\n",
    "* Output should be something like: \"The most common first letter in words is 'a' with 17096 occurences\"\n",
    "* Note that 'a' is not the correct answer - only 25 other possibilities to consider!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extra Credit: Create a plot of letter counts\n",
    "* We haven't talked about `matplotlib` or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extra Credit: Create a standalone Python script\n",
    "* Create a Python script to complete the task above, answering the question \"How many words begin with each letter of the alphabet (case-insensitive)?\"\n",
    "* Your script should be executable from the command line (remember to properly permissions with `chmod +x`)\n",
    "    * See: https://docs.python.org/3/tutorial/appendix.html#tut-scripts\n",
    "* Your script should accept the path to the `words` file as an input argument\n",
    "    * See Python `sys.argv`\n",
    "* Print the results to stdout (terminal)\n",
    "    * Output should include a letter and total count on each line (“a 17096”)\n",
    "* Save the results to a text file called `words_lettercount_Python.txt`\n",
    "    * This can be done by redirecting the output of the command used to run the script to a file\n",
    "    * Alternatively, you can create and write to this file in Python\n",
    "        * Can explore `os.path` to programatically append the `'_lettercount_Python.txt'` string to the original file path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}