Pandas uses dataFrames
to represent data. Pandas has many helper functions to read data.
import pandas as pd
import numpy as np
df = pd.read_csv('grades.csv')
df
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | |
---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | true |
1 | Mary | 13 | 36 | 24 | 53 | false |
2 | Alice | 90 | 67 | 87 | 31 | true |
3 | Bob | 78 | 76 | 68 | 89 | true |
4 | Claire | 54 | 32 | 21 | 11 | false |
Common operations on a dataFrame:
Drop a row (return a new dataFrame)
df.drop(2)
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | |
---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | true |
1 | Mary | 13 | 36 | 24 | 53 | false |
3 | Bob | 78 | 76 | 68 | 89 | true |
4 | Claire | 54 | 32 | 21 | 11 | false |
Drop several rows:
df.drop([2,3,4])
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | |
---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | true |
1 | Mary | 13 | 36 | 24 | 53 | false |
Drop some columns
df.drop('passed', axis=1)
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | |
---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 |
1 | Mary | 13 | 36 | 24 | 53 |
2 | Alice | 90 | 67 | 87 | 31 |
3 | Bob | 78 | 76 | 68 | 89 |
4 | Claire | 54 | 32 | 21 | 11 |
Drop several columns:
df.drop(['Ex 2', 'Ex 3'], axis=1)
name | Ex 1 | Ex 4 | passed | |
---|---|---|---|---|
0 | John | 86 | 32 | true |
1 | Mary | 13 | 53 | false |
2 | Alice | 90 | 31 | true |
3 | Bob | 78 | 89 | true |
4 | Claire | 54 | 11 | false |
Transform columns:
df['passed'].apply(lambda x: x=='true')
0 True 1 False 2 True 3 True 4 False Name: passed, dtype: bool
The dataFrame is not modified!
df
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | |
---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | true |
1 | Mary | 13 | 36 | 24 | 53 | false |
2 | Alice | 90 | 67 | 87 | 31 | true |
3 | Bob | 78 | 76 | 68 | 89 | true |
4 | Claire | 54 | 32 | 21 | 11 | false |
To modify it assign the modified column to istself:
df['passed'] = df['passed'].apply(lambda x: x=='true')
df
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | |
---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | True |
1 | Mary | 13 | 36 | 24 | 53 | False |
2 | Alice | 90 | 67 | 87 | 31 | True |
3 | Bob | 78 | 76 | 68 | 89 | True |
4 | Claire | 54 | 32 | 21 | 11 | False |
Create new columns:
df['average'] = (df['Ex 1'] + df['Ex 2']+df['Ex 3']+df['Ex 4'])/ 4
df
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | average | |
---|---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | True | 55.00 |
1 | Mary | 13 | 36 | 24 | 53 | False | 31.50 |
2 | Alice | 90 | 67 | 87 | 31 | True | 68.75 |
3 | Bob | 78 | 76 | 68 | 89 | True | 77.75 |
4 | Claire | 54 | 32 | 21 | 11 | False | 29.50 |
df['mention'] = df['average'] > 70
df
name | Ex 1 | Ex 2 | Ex 3 | Ex 4 | passed | average | mention | |
---|---|---|---|---|---|---|---|---|
0 | John | 86 | 57 | 45 | 32 | True | 55.00 | False |
1 | Mary | 13 | 36 | 24 | 53 | False | 31.50 | False |
2 | Alice | 90 | 67 | 87 | 31 | True | 68.75 | False |
3 | Bob | 78 | 76 | 68 | 89 | True | 77.75 | True |
4 | Claire | 54 | 32 | 21 | 11 | False | 29.50 | False |
DataFrames can be used as input to sklearn
functions.
from sklearn.preprocessing import StandardScaler
sScaler = StandardScaler()
firstTerm = df[['Ex 1', 'Ex 2']]
sScaler.fit(firstTerm)
StandardScaler()
sScaler.transform(firstTerm)
array([[ 0.76533169, 0.19834601], [-1.79747626, -1.02673227], [ 0.90575952, 0.78171661], [ 0.48447602, 1.30675016], [-0.35809097, -1.26008051]])
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier(n_neighbors=2)
kn.fit(df[['Ex 1','Ex 2', 'Ex 3', 'Ex 4']],df['passed'])
KNeighborsClassifier(n_neighbors=2)
kn.predict([
[10,10,10,10],
[50,60,70,80]
])
array([False, True])