Using Pandas

Pandas uses dataFrames to represent data. Pandas has many helper functions to read data.

import pandas as pd
import numpy as np
df = pd.read_csv('grades.csv')
df

name Ex 1 Ex 2 Ex 3 Ex 4 passed
0 John 86 57 45 32 true
1 Mary 13 36 24 53 false
2 Alice 90 67 87 31 true
3 Bob 78 76 68 89 true
4 Claire 54 32 21 11 false

Common operations on a dataFrame:

Drop a row (return a new dataFrame)

df.drop(2)

name Ex 1 Ex 2 Ex 3 Ex 4 passed
0 John 86 57 45 32 true
1 Mary 13 36 24 53 false
3 Bob 78 76 68 89 true
4 Claire 54 32 21 11 false

Drop several rows:

df.drop([2,3,4])

name Ex 1 Ex 2 Ex 3 Ex 4 passed
0 John 86 57 45 32 true
1 Mary 13 36 24 53 false

Drop some columns

df.drop('passed', axis=1)

name Ex 1 Ex 2 Ex 3 Ex 4
0 John 86 57 45 32
1 Mary 13 36 24 53
2 Alice 90 67 87 31
3 Bob 78 76 68 89
4 Claire 54 32 21 11

Drop several columns:

df.drop(['Ex 2', 'Ex 3'], axis=1)

name Ex 1 Ex 4 passed
0 John 86 32 true
1 Mary 13 53 false
2 Alice 90 31 true
3 Bob 78 89 true
4 Claire 54 11 false

Transform columns:

df['passed'].apply(lambda x: x=='true')
0     True
1    False
2     True
3     True
4    False
Name: passed, dtype: bool

The dataFrame is not modified!

df

name Ex 1 Ex 2 Ex 3 Ex 4 passed
0 John 86 57 45 32 true
1 Mary 13 36 24 53 false
2 Alice 90 67 87 31 true
3 Bob 78 76 68 89 true
4 Claire 54 32 21 11 false

To modify it assign the modified column to istself:

df['passed'] = df['passed'].apply(lambda x: x=='true')
df

name Ex 1 Ex 2 Ex 3 Ex 4 passed
0 John 86 57 45 32 True
1 Mary 13 36 24 53 False
2 Alice 90 67 87 31 True
3 Bob 78 76 68 89 True
4 Claire 54 32 21 11 False

Create new columns:

df['average'] = (df['Ex 1'] + df['Ex 2']+df['Ex 3']+df['Ex 4'])/ 4
df

name Ex 1 Ex 2 Ex 3 Ex 4 passed average
0 John 86 57 45 32 True 55.00
1 Mary 13 36 24 53 False 31.50
2 Alice 90 67 87 31 True 68.75
3 Bob 78 76 68 89 True 77.75
4 Claire 54 32 21 11 False 29.50
df['mention'] = df['average'] > 70
df

name Ex 1 Ex 2 Ex 3 Ex 4 passed average mention
0 John 86 57 45 32 True 55.00 False
1 Mary 13 36 24 53 False 31.50 False
2 Alice 90 67 87 31 True 68.75 False
3 Bob 78 76 68 89 True 77.75 True
4 Claire 54 32 21 11 False 29.50 False

DataFrames can be used as input to sklearn functions.

from sklearn.preprocessing import StandardScaler
sScaler = StandardScaler()
firstTerm = df[['Ex 1', 'Ex 2']]
sScaler.fit(firstTerm)
StandardScaler()
sScaler.transform(firstTerm)
array([[ 0.76533169,  0.19834601],
       [-1.79747626, -1.02673227],
       [ 0.90575952,  0.78171661],
       [ 0.48447602,  1.30675016],
       [-0.35809097, -1.26008051]])
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier(n_neighbors=2)
kn.fit(df[['Ex 1','Ex 2', 'Ex 3', 'Ex 4']],df['passed'])
KNeighborsClassifier(n_neighbors=2)
kn.predict([
    [10,10,10,10],
    [50,60,70,80]
])
array([False,  True])