{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Python Data Structures\n", "Strings, Lists, Tuples, Sets, Dicts \n", "Created in Python 3.7 \n", "© Joe James, 2019.\n", "\n", "## Sequences: String, List, Tuple\n", "****\n", "[Documentation](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) \n", "**indexing** - access any item in the sequence using its index. \n", "Indexing starts with 0 for the first element." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "g\n", "cow\n", "Kevin\n" ] } ], "source": [ "# string\n", "x = 'frog'\n", "print (x[3])\n", "\n", "# list\n", "x = ['pig', 'cow', 'horse']\n", "print (x[1])\n", "\n", "# tuple\n", "x = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(x[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**slicing** - slice out substrings, sublists, subtuples using indexes. \n", "[start : end+1 : step]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "omp\n", "opt\n", "puter\n", "compu\n", "r\n", "ter\n", "comput\n" ] } ], "source": [ "x = 'computer'\n", "print(x[1:4])\n", "print(x[1:6:2])\n", "print(x[3:])\n", "print(x[:5])\n", "print(x[-1])\n", "print(x[-3:])\n", "print(x[:-2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**adding / concatenating** - combine 2 sequences of the same type by using +" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "horseshoe\n", "['pig', 'cow', 'horse']\n", "('Kevin', 'Niklas', 'Jenny', 'Craig')\n" ] } ], "source": [ "# string\n", "x = 'horse' + 'shoe'\n", "print(x)\n", "\n", "# list\n", "y = ['pig', 'cow'] + ['horse']\n", "print(y)\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny') + ('Craig',)\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**multiplying** - multiply a sequence using *" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bugbugbug\n", "[8, 5, 8, 5, 8, 5]\n", "(2, 4, 2, 4, 2, 4)\n" ] } ], "source": [ "# string\n", "x = 'bug' * 3\n", "print(x)\n", "\n", "# list\n", "y = [8, 5] * 3\n", "print(y)\n", "\n", "# tuple\n", "z = (2, 4) * 3\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**checking membership** - test whether an item is or is not in a sequence." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "False\n", "True\n" ] } ], "source": [ "# string\n", "x = 'bug'\n", "print('u' in x)\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse']\n", "print('cow' not in y)\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print('Niklas' in z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**iterating** - iterating through the items in a sequence" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7\n", "8\n", "3\n", "0 7\n", "1 8\n", "2 3\n" ] } ], "source": [ "# item\n", "x = [7, 8, 3]\n", "for item in x:\n", " print(item)\n", " \n", "# index & item\n", "y = [7, 8, 3]\n", "for index, item in enumerate(y):\n", " print(index, item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**number of items** - count the number of items in a sequence" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "3\n", "4\n" ] } ], "source": [ "# string\n", "x = 'bug'\n", "print(len(x))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse']\n", "print(len(y))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(len(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**minimum** - find the minimum item in a sequence lexicographically. \n", "Alpha or numeric types, but cannot mix types." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b\n", "cow\n", "Craig\n" ] } ], "source": [ "# string\n", "x = 'bug'\n", "print(min(x))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse']\n", "print(min(y))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(min(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**maximum** - find the maximum item in a sequence lexicographically. \n", "Alpha or numeric types, but cannot mix types." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "u\n", "pig\n", "Niklas\n" ] } ], "source": [ "# string\n", "x = 'bug'\n", "print(max(x))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse']\n", "print(max(y))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(max(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**sum** - find the sum of items in a sequence. \n", "Entire sequence must be numeric." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "27\n", "20\n", "80\n" ] } ], "source": [ "# string -> error\n", "# x = [5, 7, 'bug']\n", "# print(sum(x)) # generates an error\n", "\n", "# list\n", "y = [2, 5, 8, 12]\n", "print(sum(y))\n", "print(sum(y[-2:]))\n", "\n", "# tuple\n", "z = (50, 4, 7, 19)\n", "print(sum(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**sorting** - returns a new list of items in sorted order. \n", "Does not change the original list." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['b', 'g', 'u']\n", "['cow', 'horse', 'pig']\n", "['Craig', 'Jenny', 'Kevin', 'Niklas']\n" ] } ], "source": [ "# string\n", "x = 'bug'\n", "print(sorted(x))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse']\n", "print(sorted(y))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(sorted(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**sorting** - sort by second letter \n", "Add a key parameter and a lambda function to return the second character. \n", "(the word *key* here is a defined parameter name, *k* is an arbitrary variable name)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Kevin', 'Jenny', 'Niklas', 'Craig']\n" ] } ], "source": [ "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(sorted(z, key=lambda k: k[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**count(item)** - returns count of an item" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "2\n", "1\n" ] } ], "source": [ "# string\n", "x = 'hippo'\n", "print(x.count('p'))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse', 'cow']\n", "print(y.count('cow'))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(z.count('Kevin'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**index(item)** - returns the index of the first occurence of an item." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "1\n", "2\n" ] } ], "source": [ "# string\n", "x = 'hippo'\n", "print(x.index('p'))\n", "\n", "# list\n", "y = ['pig', 'cow', 'horse', 'cow']\n", "print(y.index('cow'))\n", "\n", "# tuple\n", "z = ('Kevin', 'Niklas', 'Jenny', 'Craig')\n", "print(z.index('Jenny'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**unpacking** - unpack the n items of a sequence into n variables" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pig cow horse\n" ] } ], "source": [ "x = ['pig', 'cow', 'horse']\n", "a, b, c = x\n", "print(a, b, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists \n", "****\n", "- General purpose\n", "- Most widely used data structure \n", "- Grow and shrink size as needed\n", "- Sequence type\n", "- Sortable \n", "\n", "**constructors** - creating a new list" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7]\n", "[25, 36, 49, 64, 81]\n" ] } ], "source": [ "x = list()\n", "y = ['a', 25, 'dog', 8.43]\n", "tuple1 = (10, 20)\n", "z = list(tuple1)\n", "\n", "# list comprehension\n", "a = [m for m in range(8)]\n", "print(a)\n", "b = [i**2 for i in range(10) if i>4]\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**delete** - delete a list or an item in a list" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 8, 6]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "del(x[1])\n", "print(x)\n", "del(x) # list x no longer exists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**append** - append an item to a list" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 3, 8, 6, 7]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.append(7)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**extend** - append a sequence to a list" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 3, 8, 6, 12, 13]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "y = [12, 13]\n", "x.extend(y)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**insert** - insert an item at a given index" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 7, 3, 8, 6]\n", "[5, ['a', 'm'], 7, 3, 8, 6]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.insert(1, 7)\n", "print(x)\n", "x.insert(1, ['a', 'm'])\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**pop** - pops last item off list and returns item" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 3, 8]\n", "8\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.pop() # pop off the 6\n", "print(x)\n", "print(x.pop())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**remove** - remove first instance of an item" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 8, 6, 3]\n" ] } ], "source": [ "x = [5, 3, 8, 6, 3]\n", "x.remove(3)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**reverse** - reverse the order of the list. It is an in-place sort, meaning it changes the original list." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[6, 8, 3, 5]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.reverse()\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**sort** - sort the list in place. \n", "Note: \n", "sorted(x) returns a new sorted list without changing the original list x. \n", "x.sort() puts the items of x in sorted order (sorts in place)." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3, 5, 6, 8]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.sort()\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**reverse sort** - sort items descending. \n", "Use *reverse=True* parameter to the sort function." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[8, 6, 5, 3]\n" ] } ], "source": [ "x = [5, 3, 8, 6]\n", "x.sort(reverse=True)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples\n", "****\n", "- Immutable (can’t add/change)\n", "- Useful for fixed data\n", "- Faster than Lists\n", "- Sequence type \n", " \n", "**constructors** - creating new tuples." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2,) \n", "(2, 4, 6) \n" ] } ], "source": [ "x = ()\n", "x = (1, 2, 3)\n", "x = 1, 2, 3\n", "x = 2, # the comma tells Python it's a tuple\n", "print(x, type(x))\n", "\n", "list1 = [2, 4, 6]\n", "x = tuple(list1)\n", "print(x, type(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**tuples are immutable**, but member objects may be mutable." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 2, 3)\n", "([1], 3)\n", "([1], 3, 4)\n" ] } ], "source": [ "x = (1, 2, 3)\n", "# del(x[1]) # fails\n", "# x[1] = 8 # fails\n", "print(x)\n", "\n", "y = ([1, 2], 3) # a tuple where the first item is a list\n", "del(y[0][1]) # delete the 2\n", "print(y) # the list within the tuple is mutable\n", "\n", "y += (4,) # concatenating two tuples works\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sets\n", "****\n", "- Store non-duplicate items \n", "- Very fast access vs Lists \n", "- Math Set ops (union, intersect) \n", "- Sets are Unordered \n", " \n", "**constructors** - creating new sets" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{3, 5}\n", "set()\n", "{2, 3, 4}\n" ] } ], "source": [ "x = {3, 5, 3, 5}\n", "print(x)\n", "\n", "y = set()\n", "print(y)\n", "\n", "list1 = [2, 3, 4]\n", "z = set(list1)\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**set operations**" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{8, 3, 5}\n", "{8, 3, 5, 7}\n", "{8, 5, 7}\n", "3\n", "True\n", "8 {5, 7}\n", "set()\n" ] } ], "source": [ "x = {3, 8, 5}\n", "print(x)\n", "x.add(7)\n", "print(x)\n", "\n", "x.remove(3)\n", "print(x)\n", "\n", "# get length of set x\n", "print(len(x))\n", "\n", "# check membership in x\n", "print(5 in x)\n", "\n", "# pop random item from set x\n", "print(x.pop(), x)\n", "\n", "# delete all items from set x\n", "x.clear()\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Mathematical set operations** \n", "intersection (AND): set1 & set2 \n", "union (OR): set1 | set2 \n", "symmetric difference (XOR): set1 ^ set2 \n", "difference (in set1 but not set2): set1 - set2 \n", "subset (set2 contains set1): set1 <= set2 \n", "superset (set1 contains set2): set1 >= set2" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{3}\n", "{1, 2, 3, 4, 5}\n", "{1, 2, 4, 5}\n", "{1, 2}\n", "False\n", "False\n" ] } ], "source": [ "s1 = {1, 2, 3}\n", "s2 = {3, 4, 5}\n", "print(s1 & s2)\n", "print(s1 | s2)\n", "print(s1 ^ s2)\n", "print(s1 - s2)\n", "print(s1 <= s2)\n", "print(s1 >= s2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dictionaries (dict)\n", "****\n", "- Key/Value pairs\n", "- Associative array, like Java HashMap\n", "- Dicts are Unordered \n", "\n", "**constructors** - creating new dictionaries" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}\n", "{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}\n", "{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}\n" ] } ], "source": [ "x = {'pork':25.3, 'beef':33.8, 'chicken':22.7}\n", "print(x)\n", "x = dict([('pork', 25.3),('beef', 33.8),('chicken', 22.7)])\n", "print(x)\n", "x = dict(pork=25.3, beef=33.8, chicken=22.7)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**dict operations**" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7, 'shrimp': 38.2}\n", "{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}\n", "3\n", "{}\n" ] } ], "source": [ "x['shrimp'] = 38.2 # add or update\n", "print(x)\n", "\n", "# delete an item\n", "del(x['shrimp'])\n", "print(x)\n", "\n", "# get length of dict x\n", "print(len(x))\n", "\n", "# delete all items from dict x\n", "x.clear()\n", "print(x)\n", "\n", "# delete dict x\n", "del(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**accessing keys and values in a dict** \n", "Not compatible with Python 2." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['pork', 'beef', 'chicken'])\n", "dict_values([25.3, 33.8, 22.7])\n", "dict_items([('pork', 25.3), ('beef', 33.8), ('chicken', 22.7)])\n", "True\n", "False\n" ] } ], "source": [ "y = {'pork':25.3, 'beef':33.8, 'chicken':22.7}\n", "print(y.keys())\n", "print(y.values())\n", "print(y.items()) # key-value pairs\n", "\n", "# check membership in y_keys (only looks in keys, not values)\n", "print('beef' in y)\n", "\n", "# check membership in y_values\n", "print('clams' in y.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**iterating a dict** - note, items are in random order." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pork 25.3\n", "beef 33.8\n", "chicken 22.7\n", "pork 25.3\n", "beef 33.8\n", "chicken 22.7\n" ] } ], "source": [ "for key in y:\n", " print(key, y[key])\n", " \n", "for k, v in y.items():\n", " print(k, v)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }