# Python Numpy Intro
An introduction to the [Python Numpy](http://www.numpy.org/) numerical python library. 
The core data structure behind Numpy is the n-dimensional [Numpy Array](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). It is 3x to 10x faster and more memory efficient than Python's lists because, similar to Java arrays, it uses contiguous blocks of memory, and all elements are the same data type so there is no type checking at runtime. The Numpy library also includes many built-in code-saving mathematical functions that can be performed on an entire array or any slice of an array with a single line of code (ie. no for loops). 
Numpy n-dimensional arrays are also sometimes referred to as nd-arrays.

**Install Numpy** using pip: pip install numpy
The convention for importing numpy is *import numpy as np*.

import numpy as np

### Creating a Numpy Array
There are MANY ways to instantiate a numpy array. I covered the most common ones below. [Docs here cover more constructors](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html).
- Pass in a list to the array() constructor
- Use the arange function, similar to the range function but used for Numpy arrays. Uses arguments, (start, stop+1, step).
- Use linspace to create an array of n equally spaced values. Uses arguments (start, stop, number of items).
- Create an array empty, full of ones or zeros, or full of any fill value. Uses argument (shape) in the form of a tuple. 

You can pass in dtype as an optional argument for any of these. This is especially useful if you want to limit memory usage for a very large array of small integers because int8 and int16 use much less space than the default int32.

In [96]:
a = np.array([1,3,5,7,9,11])
print(a)

a = np.arange(1, 12, 2) # (start, stop, step)
print(a)

a = np.linspace(5, 8, 13) # (start, stop, number of items)
print(a)

a = np.zeros((4, 2))
print(a)

a = np.ones((2, 3), dtype=np.int16)
print(a)

a = np.full((6,), 88)
print(a)

a = np.fromstring('25 30 35 40', dtype=np.int, sep=' ')
print(a)

a = np.array([[1,3,5],[7,9,11]])
print(a)

b = np.zeros_like(a) # _like gives you a new array in the same shape as the argument.
print(b)

[ 1 3 5 7 9 11]
[ 1 3 5 7 9 11]
[5. 5.25 5.5 5.75 6. 6.25 6.5 6.75 7. 7.25 7.5 7.75 8. ]
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
[[1 1 1]
 [1 1 1]]
[88 88 88 88 88 88]
[25 30 35 40]
[[ 1 3 5]
 [ 7 9 11]]
[[0 0 0]
 [0 0 0]]


### Numpy Array Attributes
Get size (number of items), shape (dimensions), itemsize(bytes of memory for each item), and dtype (numpy data type). 
See how many bytes of memory space the whole array uses from the product of size and itemsize. See [complete list of attributes and methods](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html).

In [97]:
print(a.size)
print(a.shape)
print(a.ndim)
print(a.itemsize)
print(a.dtype)
print(a.nbytes) # same as a.size * a.itemsize

6
(2, 3)
2
4
int32
24


### Indexing and Slicing
Use square brackets to get any item of an array by index. Multi-dimensional arrays can use multiple square brackets.

There are three arguments for slicing arrays, all are optional: [start:stop:step]. 
 If start is left blank it defaults to 0. If stop is left blank it defaults to the end of the array. Step defaults to 1.

In [98]:
print(a)
print(a[1])
print(a[0][2])
print(b[2:4])

print(a[:1])
print(a[1:3:2])
print(a[:, 1:2]) # all elements on dimension 0, only element 1 on dimension 1

[[ 1 3 5]
 [ 7 9 11]]
[ 7 9 11]
5
[]
[[1 3 5]]
[[ 7 9 11]]
[[3]
 [9]]


### Reshape, Swap Axes, Flatten
See full list of [array manipulation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html).

In [99]:
c = np.arange(-9, -3,).reshape(2,3)
print(c)

c = c.swapaxes(0,1)
print(c)

c = c.flatten()
print(c)

[[-9 -8 -7]
 [-6 -5 -4]]
[[-9 -6]
 [-8 -5]
 [-7 -4]]
[-9 -6 -8 -5 -7 -4]


### Use dtype to Save Space
Default data types (int32 and float64) are memory hogs. If you don't need the higher precision you can save a lot of memory space and improve speed of operations by using smaller data types. For large data sets this makes a big difference.

In [100]:
d = np.arange(0,100)
print(d.dtype, type(d[1]))
print(d.nbytes)

d = np.arange(0,100, dtype='int8')
print(d.dtype, type(d[1]))
print(d.nbytes)

int32 
400
int8 
100


### UpCasting, Rounding, Print Formatting
Data type of all Items is upcast to the most precise element. 

In [101]:
e = np.array([(1.566666,2,3), (4,5,6)])
print(e.dtype)

e = e.round(4)
print(e)

np.set_printoptions(precision=2, suppress=True) # show 2 decimal places, suppress scientific notation
print(e)

float64
[[1.57 2. 3. ]
 [4. 5. 6. ]]
[[1.57 2. 3. ]
 [4. 5. 6. ]]


### Numpy Data Types Available
uint is unsigned int, for positive numbers.

In [102]:
import pprint as pp
pp.pprint(np.sctypes)

{'complex': [, ],
 'float': [,
 ,
 ],
 'int': [,
 ,
 ,
 ,
 ],
 'others': [,
 ,
 ,
 ,
 ],
 'uint': [,
 ,
 ,
 ,
 ]}


### Reading and Writing to Files
Can use [loadtxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt), or [genfromtxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt) to load data to load an entire file into an array at once. Genfromtxt is more fault tolerant. 
Use [savetxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html#numpy.savetxt) to write an array to file.

In [103]:
f = np.loadtxt('data.txt', skiprows=1, delimiter=',', dtype=np.int32)
print(f)
print(f.dtype)

np.savetxt('data2.txt', f, delimiter=';', fmt='%d', header='a;b;c;d;e;f;g;h;i;j', comments='')

[[9 3 8 7 6 1 0 4 2 5]
 [1 7 4 9 2 6 8 3 5 0]
 [4 8 3 9 5 7 2 6 0 1]
 [1 7 4 2 5 9 6 8 0 3]
 [0 7 5 2 8 6 3 4 1 9]
 [5 9 1 4 7 0 3 6 8 2]]
int32


In [111]:
g = np.genfromtxt('data.txt', skip_header=1, delimiter=',', dtype=np.int32)
print(g)

[[9 3 8 7 6 1 0 4 2 5]
 [1 7 4 9 2 6 8 3 5 0]
 [4 8 3 9 5 7 2 6 0 1]
 [1 7 4 2 5 9 6 8 0 3]
 [0 7 5 2 8 6 3 4 1 9]
 [5 9 1 4 7 0 3 6 8 2]]


### Mathematical Functions
Numpy has an extensive list of [math and scientific functions](https://docs.scipy.org/doc/numpy/reference/routines.html). 
The best part is that you don't have to iterate. You can apply an operation to the entire array or a slice of an array at once.

In [105]:
print(g > 4)
print(g ** 2 - 1)

[[ True False True True True False False False False True]
 [False True False True False True True False True False]
 [False True False True True True False True False False]
 [False True False False True True True True False False]
 [False True True False True True False False False True]
 [ True True False False True False False True True False]]
[[80 8 63 48 35 0 -1 15 3 24]
 [ 0 48 15 80 3 35 63 8 24 -1]
 [15 63 8 80 24 48 3 35 -1 0]
 [ 0 48 15 3 24 80 35 63 -1 8]
 [-1 48 24 3 63 35 8 15 0 80]
 [24 80 0 15 48 -1 8 35 63 3]]


In [106]:
print(g.min())
print(g.max())
print(g.sum())
print(g.mean())
print(g.var()) # variance
print(g.std()) # standard deviation

print(g.sum(axis=1))
print(g.min(axis=0))

print(g.argmin()) # index of min element
print(g.argmax()) # index of max element
print(g.argsort()) # returns array of indices that would put the array in sorted order

0
9
270
4.5
8.25
2.8722813232690143
[45 45 45 45 45 45]
[0 3 1 2 2 0 0 3 0 0]
6
0
[[6 5 8 1 7 9 4 3 2 0]
 [9 0 4 7 2 8 5 1 6 3]
 [8 9 6 2 0 4 7 5 1 3]
 [8 0 3 9 2 4 6 1 7 5]
 [0 8 3 6 7 2 5 1 4 9]
 [5 2 9 6 3 0 7 4 8 1]]


### Column Operations
Apply functions only to specific columns by slicing, or create a new array from the columns you want, then work on them. 
But Beware that creating a new pointer to the same data can screw up your data if you're not careful.

In [113]:
print(g[:, 2:3])
print(g[:, 2:3].max())

col3 = g[:, 3:4] # not a copy, just a pointer to a slice of g
print(col3.std())

col3 *= 100 # Beware: this is applied to g data
print(g)

[[8]
 [4]
 [3]
 [4]
 [5]
 [1]]
8
298.607881119482
[[ 9 3 8 70000 6 1 0 4 2 5]
 [ 1 7 4 90000 2 6 8 3 5 0]
 [ 4 8 3 90000 5 7 2 6 0 1]
 [ 1 7 4 20000 5 9 6 8 0 3]
 [ 0 7 5 20000 8 6 3 4 1 9]
 [ 5 9 1 40000 7 0 3 6 8 2]]


### Numpy Random Functions

In [None]:
np.set_printoptions(precision=5, suppress=True) # show 5 decimal places, suppress scientific notation
h = np.random.random(6)
print(h)

h = np.random.randint(10, 99, 8) # (low, high inclusive, size)
print(h)

np.random.shuffle(h) # in-place shuffle
print(h)

print(np.random.choice(h))

h.sort() # in-place sort
print(h)