This blog is written in Jupyter notebook, so you can experiment and learn by editing the notebook.
Click here for notebook.
Just change the input and check the output.
Learning by experiment and hands-on exercises is always better.
The purpose of this notebook is just to revise python basics.
Let's get started.
1. NUMPY BASICS
NumPy is a Linear Algebra Library used for multidimensional arrays
NumPy brings the best of two worlds:
- C/Fortran computational efficiency,
- Python language easy syntax
import numpy as np
# Let's define a one-dimensional array
my_list = [10, 20, 30, 40, 50, 60, 70, 80]
my_list
[10, 20, 30, 40, 50, 60, 70, 80]
Let's create a numpy array from the list "my_list"
x = np.array(my_list)
x
array([10, 20, 30, 40, 50, 60, 70, 80])
Get shape
x.shape
(8,)
Let's create a Multi-dimensional numpy array from the list "my_list"
matrix = np.array([[5, 8], [9, 13]])
matrix
array([[ 5, 8],
[ 9, 13]])
# "rand()" uniform distribution between 0 and 1
xy = np.random.rand(7)
xy
array([0.40408966, 0.12527144, 0.04465052, 0.39450693, 0.93339664,
0.14009694, 0.94461679])
you can create a matrix of random number from random.rand
xy = np.random.rand(2, 2)
xy
array([[0.86152202, 0.22526627],
[0.41562272, 0.33467273]])
# "randn()" normal distribution between 0 and 1
xy = np.random.randn(7)
xy
array([-1.27678101, 1.20667812, 0.7945132 , 0.62421099, -0.44447512,
-0.57038096, 2.19949273])
"randint" is used to generate random integers between upper and lower bounds
xy = np.random.randint(1, 10)
xy
9
Create an evenly spaced values with a step of 7
xy = np.arange(1, 50, 7)
xy
array([ 1, 8, 15, 22, 29, 36, 43])
# Array of ones
xy = np.ones(7)
xy
array([1., 1., 1., 1., 1., 1., 1.])
# Matrices of ones
xy = np.ones((2, 2))
xy
array([[1., 1.],
[1., 1.]])
# Array of zeros
xy = np.zeros(5)
xy
array([0., 0., 0., 0., 0.])
Reshape 1D array into a matrix
z = x.reshape(2,4)
print(x)
print(z)
[10 20 30 40 50 60 70 80]
[[10 20 30 40]
[50 60 70 80]]
Obtain the maximum element (value)
x.max()
80
Obtain the minimum element (value)
x.min()
10
Obtain the location of the max element
x.argmax()
7
# Obtain the location of the min element
x.argmin()
0
# Access specific index from the numpy array
x[0]
10
# Starting from the first index 0 up until and NOT including the last element
x[0:3]
array([10, 20, 30])
# Broadcasting, altering several values in a numpy array at once
x[0:2] = 10
x
array([10, 10, 30, 40, 50, 60, 70, 80])
2. Pandas
Pandas is a data manipulation and analysis tool that is built on Numpy.
Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python).
DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
Series Vs. DataFrame? Series is considered a single column of a DataFrame.
import pandas as pd
# Let's define two lists as shown below:
stock_list = ['Reliance','AMAZON','facebook']
stock_list
['Reliance', 'AMZN', 'facebook']
label = ['stock#1', 'stock#2', 'stock#3']
label
['stock#1', 'stock#2', 'stock#3']
Let's create a one dimensional Pandas "series"
Note that series is formed of data and associated labels
x_series = pd.Series(data = stock_list, index = label)
# Let's view the series
x_series
stock#1 Reliance
stock#2 AMZN
stock#3 facebook
dtype: object
Let's obtain the datatype
type(x_series)
pandas.core.series.Series
Let's define a two-dimensional Pandas DataFrame
Note that you can create a pandas dataframe from a python dictionary
bank_client_df = pd.DataFrame({'Bank client ID':[1111, 2222, 3333, 4444],
'Bank Client Name':['Kiran', 'Chaitanya', 'dheeraj', 'shreyas'],
'Net worth [$]':[3500, 29000, 10000, 2000],
'Years with bank':[3, 4, 9, 5]})
bank_client_df
Bank client ID | Bank Client Name | Net worth [$] | Years with bank | |
---|---|---|---|---|
0 | 1111 | Kiran | 3500 | 3 |
1 | 2222 | Chaitanya | 29000 | 4 |
2 | 3333 | dheeraj | 10000 | 9 |
3 | 4444 | shreyas | 2000 | 5 |
Let's obtain the data type
type(bank_client_df)
pandas.core.frame.DataFrame
you can only view the first couple of rows using .head()
bank_client_df.head(2)
Bank client ID | Bank Client Name | Net worth [$] | Years with bank | |
---|---|---|---|---|
0 | 1111 | Kiran | 3500 | 3 |
1 | 2222 | Chaitanya | 29000 | 4 |
you can only view the last couple of rows using .tail()
bank_client_df.tail(1)
Bank client ID | Bank Client Name | Net worth [$] | Years with bank | |
---|---|---|---|---|
3 | 4444 | shreyas | 2000 | 5 |
Pandas is used to read a csv file and store data in a DataFrame
bank_df = pd.read_csv('sample.csv')
write to a csv file without an index
bank_df.to_csv('sample_output.csv', index = False)
CONCATENATING AND MERGING WITH PANDAS
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df1
A | B | C | D | |
---|---|---|---|---|
0 | A0 | B0 | C0 | D0 |
1 | A1 | B1 | C1 | D1 |
2 | A2 | B2 | C2 | D2 |
3 | A3 | B3 | C3 | D3 |
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
df2
A | B | C | D | |
---|---|---|---|---|
4 | A4 | B4 | C4 | D4 |
5 | A5 | B5 | C5 | D5 |
6 | A6 | B6 | C6 | D6 |
7 | A7 | B7 | C7 | D7 |
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])
df3
A | B | C | D | |
---|---|---|---|---|
8 | A8 | B8 | C8 | D8 |
9 | A9 | B9 | C9 | D9 |
10 | A10 | B10 | C10 | D10 |
11 | A11 | B11 | C11 | D11 |
pd.concat([df1, df2, df3])
A | B | C | D | |
---|---|---|---|---|
0 | A0 | B0 | C0 | D0 |
1 | A1 | B1 | C1 | D1 |
2 | A2 | B2 | C2 | D2 |
3 | A3 | B3 | C3 | D3 |
4 | A4 | B4 | C4 | D4 |
5 | A5 | B5 | C5 | D5 |
6 | A6 | B6 | C6 | D6 |
7 | A7 | B7 | C7 | D7 |
8 | A8 | B8 | C8 | D8 |
9 | A9 | B9 | C9 | D9 |
10 | A10 | B10 | C10 | D10 |
11 | A11 | B11 | C11 | D11 |