Commit 62b77bc2 authored by Simon Clarke's avatar Simon Clarke
Browse files

Final version of Imputation notebook.

parent 1adb9ff8
......@@ -10,10 +10,26 @@
Previously, we have dealt with missing data by deleting that entry. However, that means losing valuable data which contributes to the training of your model. A better approach is to impute the data, i.e., infer the missing data from the existing observations.
We will concentrate here on Scikit-Learn's imputation routines, although some of the techniques, such as replacement of values with the mean or mode, can be easily implemented in Pandas.
%% Cell type:markdown id: tags:
## Contents
%% Cell type:markdown id: tags:
* Introduction
* Cross-validation analysis
* Exercises
%% Cell type:markdown id: tags:
## Introduction
%% Cell type:markdown id: tags:
We first import the standard libraries and the csv file.
%% Cell type:code id: tags:
``` python
......@@ -38,56 +54,56 @@
```
%%%% Output: execute_result
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
480 3 158 70 30 328 35.5
652 5 123 74 40 77 34.1
219 5 112 66 0 0 37.8
205 5 111 72 28 0 23.9
359 1 196 76 36 249 36.5
672 10 68 106 23 49 35.5
402 5 136 84 41 88 35.0
626 0 125 68 0 0 24.7
564 0 91 80 0 0 32.4
34 10 122 78 31 0 27.6
68 1 95 66 13 38 19.6
639 1 100 74 12 46 19.5
400 4 95 64 0 0 32.0
314 7 109 80 31 0 35.9
508 2 84 50 23 76 30.4
105 1 126 56 29 152 28.7
35 4 103 60 33 192 24.0
164 0 131 88 0 0 31.6
244 2 146 76 35 194 38.2
220 0 177 60 29 478 34.6
11 10 168 74 0 0 38.0
111 8 155 62 26 495 34.0
186 8 181 68 36 495 30.1
396 3 96 56 34 115 24.7
258 1 193 50 16 375 25.9
26 7 147 76 0 0 39.4
679 2 101 58 17 265 24.2
139 5 105 72 29 325 36.9
269 2 146 0 0 0 27.5
589 0 73 0 0 0 21.1
547 4 131 68 21 166 33.1
401 6 137 61 0 0 24.2
168 4 110 66 0 0 31.9
12 10 139 80 0 0 27.1
412 1 143 84 23 310 42.4
6 3 78 50 32 88 31.0
153 1 153 82 42 485 40.6
71 5 139 64 35 140 28.6
632 2 111 60 0 0 26.2
DiabetesPedigreeFunction Age Outcome
480 0.344 35 1
652 0.269 28 0
219 0.261 41 1
205 0.407 27 0
359 0.875 29 1
672 0.285 47 0
402 0.286 35 1
626 0.206 21 0
564 0.601 27 0
34 0.512 45 0
68 0.334 25 0
639 0.149 28 0
400 0.161 31 1
314 1.127 43 1
508 0.968 21 0
105 0.801 21 0
35 0.966 33 0
164 0.743 32 1
244 0.329 29 0
220 1.072 21 1
11 0.537 34 1
111 0.543 46 1
186 0.615 60 1
396 0.944 39 0
258 0.655 24 0
26 0.257 43 1
679 0.614 23 0
139 0.159 28 0
269 0.240 28 1
589 0.342 25 0
547 0.160 28 0
401 0.151 55 0
168 0.471 29 0
12 1.441 57 0
412 1.076 22 0
6 0.248 26 1
153 0.687 23 0
71 0.411 26 0
632 0.343 23 0
%% Cell type:markdown id: tags:
This can be investigated further by displaying the descriptive statistics, for which it is apparent that `Glucose` and `BMI` also have unrealistic values of 0. A value of `Pregnancies` of 0, is a physically realistic value.
This can be investigated further by displaying the descriptive statistics, for which it is apparent that `Glucose` and `BMI` also have unrealistic values of 0. A value of 0 for `Pregnancies` is a physically realistic value.
%% Cell type:code id: tags:
``` python
pima.describe()
......@@ -150,13 +166,13 @@
def rf_model(pimadf):
Xf = pimadf.drop(columns=['Outcome'])
Yf = np.ravel(pimadf[['Outcome']])
X_train, X_test, Y_train, Y_test = train_test_split(Xf,Yf,test_size=0.8,random_state=0)
X_train, X_test, Y_train, Y_test = train_test_split(Xf, Yf, test_size=0.2, random_state=0)
rfc = RandomForestClassifier()
rfc.fit(X_train,Y_train) # fit the data to the model
rfc.fit(X_train, Y_train) # fit the data to the model
Y_pred = rfc.predict(X_test)
acc = accuracy_score(Y_test,Y_pred)
print("Testing score is %5.3f" % acc)
feature_importances = pd.DataFrame(rfc.feature_importances_,
......@@ -280,11 +296,11 @@
The second method we consider is the sklearn `IterativeImputer`. This is an experimental addition to sklearn, so needs to be enabled as well as imported. As it is experimental, it may change in future versions.
`IterativeImputer` works be marking the missing values, and then repeating the imputation process N times or until the data converges. Initially the missing values are set using a simple scheme, such as being replaced by the mean or median. Then on each iteration a machine learning algorithm is used as a regressor to update each column which is marked as having missing values. The non-missing values are used to train the model, and then the model is used to predict the missing values. Any regression technique could be used to predict the missing values. Common ones that are used are BayesianRidge, k-Nearest Neighbours and Random Forest Regression. Using this algorithm with Random Forest Regression is equivalent to the R routine `missForest`. The routine `KNNImputer` can be seen as `IterativeImputer` with one iteration.
In this example, we use the default algorithm, BayesianRidge. This gives that the testing score slightly, however the feature importance is consistent with the original dataset and the results of ``KNNImputer`.
In this example, we use the default algorithm, BayesianRidge. This gives that the testing score slightly, however the feature importance is consistent with the original dataset and the results of `KNNImputer`.
%% Cell type:code id: tags:
``` python
from sklearn.experimental import enable_iterative_imputer
......@@ -341,10 +357,14 @@
![]()
%% Cell type:markdown id: tags:
## Cross-validation analysis
%% Cell type:markdown id: tags:
For all the examples so far we have only consider one realisation of the Random Forest Regressor. To understand the effectiveness of the various imputation algorithms we need to combine this with cross validation. The following code consider the variation of the f1-score using Logistic Regression for the imputation strategies:
* Drop rows with missing values.
* Simple imputation using the mean.
* Simple imputation using the median.
* k-Nearest Neighbours imputation.
......@@ -359,25 +379,25 @@
%% Cell type:code id: tags:
``` python
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedKFold
N_SPLITS = 10
N_SPLITS = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)
classifier = LogisticRegression(solver='newton-cg', C=1.e3)
score = 'f1'
X_full = pima_drop.drop(columns=['Outcome'])
Y_full = np.ravel(pima_drop[['Outcome']])
lr_estimator = LogisticRegression(solver='newton-cg', C=1.e3)
score_drop = pd.DataFrame(
cross_val_score(
lr_estimator, X_full, Y_full, scoring='f1', cv=N_SPLITS
classifier, X_full, Y_full, scoring=score, cv=N_SPLITS
),
columns=['Drop Data']
)
```
%% Cell type:code id: tags:
``` python
......@@ -390,15 +410,14 @@
score_simple_imputer = pd.DataFrame()
for strategy in ('mean', 'median'):
estimator = make_pipeline(
SimpleImputer(missing_values=np.nan, strategy=strategy),
lr_estimator
classifier
)
score_simple_imputer[strategy] = cross_val_score(
estimator, X_missing, Y_missing, scoring='f1',
cv=N_SPLITS
estimator, X_missing, Y_missing, scoring=score, cv=N_SPLITS
)
```
%% Cell type:code id: tags:
......@@ -407,15 +426,14 @@
from sklearn.impute import KNNImputer
score_knn_imputer = pd.DataFrame()
estimator = make_pipeline(
KNNImputer(n_neighbors=15),
rf_estimator
classifier
)
score_knn_imputer['KNeighborsRegressor'] = cross_val_score(
estimator, X_missing, Y_missing, scoring='f1',
cv=N_SPLITS
estimator, X_missing, Y_missing, scoring=score, cv=N_SPLITS
)
```
%% Cell type:code id: tags:
......@@ -434,15 +452,15 @@
]
score_iterative_imputer = pd.DataFrame()
for impute_estimator in estimators:
estimator = make_pipeline(
IterativeImputer(random_state=0, estimator=impute_estimator, max_iter=10),
lr_estimator
classifier
)
score_iterative_imputer[impute_estimator.__class__.__name__] = \
cross_val_score(
estimator, X_missing, Y_missing, scoring='f1', cv=N_SPLITS
estimator, X_missing, Y_missing, scoring=score, cv=N_SPLITS
)
```
%% Cell type:markdown id: tags:
......@@ -472,40 +490,40 @@
%%%% Output: execute_result
Original SimpleImputer KNN \
Drop Data mean median KNeighborsRegressor
count 10.000000 10.000000 10.000000 10.000000
mean 0.621762 0.626776 0.632280 0.626371
std 0.132392 0.062190 0.055559 0.059190
min 0.352941 0.520000 0.541667 0.520000
25% 0.543478 0.605662 0.605662 0.605662
50% 0.641026 0.632874 0.640256 0.626225
75% 0.718531 0.663265 0.666667 0.663265
max 0.782609 0.727273 0.727273 0.711111
count 15.000000 15.000000 15.000000 15.000000
mean 0.625292 0.632006 0.630530 0.634507
std 0.093487 0.028496 0.028269 0.029221
min 0.387097 0.589474 0.589474 0.589474
25% 0.588040 0.606338 0.604167 0.605114
50% 0.641509 0.639175 0.633663 0.640777
75% 0.673333 0.653870 0.648134 0.659567
max 0.740741 0.675000 0.675000 0.674157
IterativeImputer \
BayesianRidge DecisionTreeRegressor RandomForestRegressor
count 10.000000 10.000000 10.000000
mean 0.635335 0.629131 0.633366
std 0.056181 0.059494 0.049522
min 0.541667 0.541667 0.541667
25% 0.613384 0.588889 0.612077
50% 0.637331 0.625000 0.645680
75% 0.663265 0.663462 0.663462
max 0.727273 0.739130 0.711111
count 15.000000 15.000000 15.000000
mean 0.637322 0.626273 0.633008
std 0.032623 0.033283 0.028420
min 0.589474 0.571429 0.589474
25% 0.608206 0.600885 0.610594
50% 0.633663 0.631579 0.628571
75% 0.663522 0.654536 0.659471
max 0.688889 0.673913 0.674157
KNeighborsRegressor
count 10.000000
mean 0.627487
std 0.057144
min 0.541667
25% 0.584496
50% 0.626225
75% 0.666667
max 0.711111
count 15.000000
mean 0.627713
std 0.024185
min 0.589474
25% 0.605454
50% 0.629213
75% 0.649111
max 0.659091
%% Cell type:code id: tags:
``` python
# plot results
......@@ -516,16 +534,111 @@
plt.show()
```
%%%% Output: display_data
![]()
![]()
%% Cell type:markdown id: tags:
## Exercises
%% Cell type:markdown id: tags:
For the exercises we will use the [Abalone Dataset](https://archive.ics.uci.edu/ml/datasets/Abalone), which can be downloaded from [Monash Gitlab](https://gitlab.erc.monash.edu.au/bads/data-challenges-resources/-/tree/main/Machine-Learning/Imputation/abalone.csv). This consists of physical measurements of abalones from the Tasmanian coast in the 1990s, in an effort to determine their age. Previously the age would need to be determined in the laboratory by counting the number of rings in the shell. Then $Age = Rings + 1.5$. This is a complete dataset, however we will randomly remove entries in two columns to perform imputation.
First we load the dataset.
%% Cell type:code id: tags:
``` python
abalone = pd.read_csv("abalone.csv")
abalone.head()
```
%% Cell type:markdown id: tags:
The `Sex` field has three categorical entries: Male (M), Female (F) and Infant (I). Se we need to one-hot encode these fields to create three binary columns.
%% Cell type:code id: tags:
``` python
dummy = pd.get_dummies(abalone['Sex'])
abalone = pd.concat([abalone, dummy], axis=1)
abalone.drop(columns=['Sex'], inplace=True)
abalone.head()
```
%%%% Output: execute_result
Length Diameter Height Whole weight Shucked weight Viscera weight \
0 0.455 0.365 0.095 0.5140 0.2245 0.1010
1 0.350 0.265 0.090 0.2255 0.0995 0.0485
2 0.530 0.420 0.135 0.6770 0.2565 0.1415
3 0.440 0.365 0.125 0.5160 0.2155 0.1140
4 0.330 0.255 0.080 0.2050 0.0895 0.0395
Shell weight Rings F I M
0 0.150 15 0 0 1
1 0.070 7 0 0 1
2 0.210 9 1 0 0
3 0.155 10 0 0 1
4 0.055 7 0 1 0
%% Cell type:markdown id: tags:
Last we create a features array (Xf) and a label array (Yf). Then we randomly remove 33% of the `Height` samples and 25% of the `Shell weight` samples from the features array.
%% Cell type:code id: tags:
``` python
Xf = abalone.drop(columns=['Rings'])
Yf = abalone[['Rings']]
X = Xf.copy()
X['Height'] = X['Height'].sample(frac=0.67)
X['Shell weight'] = X['Shell weight'].sample(frac=0.75)
X.describe()
```
%% Cell type:markdown id: tags:
### Exercise 1 (2 marks)
%% Cell type:markdown id: tags:
Create a Random Forest Regressor model for the full dataset and determine the accuracy of this model. Use an 80:20 split for training and testing.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
### Exercise 2 (3 marks)
%% Cell type:markdown id: tags:
Fill in the missing values of X using `IterativeImputer` with 10 iterations and using the `BayesianRidge` regressor. Calculate the accuracy of the Random Forest Regressor using this imputed dataset.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
### Exercise 3 (5 marks)
%% Cell type:markdown id: tags:
Fill in the missing values of X using `IterativeImputer` with 10 iterations and using `KNeighborsRegressor` for 5, 10, 15 and 20 neighbours. Calculate the accuracy of the Random Forest Regressor using each of these imputed datasets.
%% Cell type:code id: tags:
``` python
```
......
Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
F,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19
F,0.525,0.38,0.14,0.6065,0.194,0.1475,0.21,14
M,0.43,0.35,0.11,0.406,0.1675,0.081,0.135,10
M,0.49,0.38,0.135,0.5415,0.2175,0.095,0.19,11
F,0.535,0.405,0.145,0.6845,0.2725,0.171,0.205,10
F,0.47,0.355,0.1,0.4755,0.1675,0.0805,0.185,10
M,0.5,0.4,0.13,0.6645,0.258,0.133,0.24,12
I,0.355,0.28,0.085,0.2905,0.095,0.0395,0.115,7
F,0.44,0.34,0.1,0.451,0.188,0.087,0.13,10
M,0.365,0.295,0.08,0.2555,0.097,0.043,0.1,7
M,0.45,0.32,0.1,0.381,0.1705,0.075,0.115,9
M,0.355,0.28,0.095,0.2455,0.0955,0.062,0.075,11
I,0.38,0.275,0.1,0.2255,0.08,0.049,0.085,10
F,0.565,0.44,0.155,0.9395,0.4275,0.214,0.27,12
F,0.55,0.415,0.135,0.7635,0.318,0.21,0.2,9
F,0.615,0.48,0.165,1.1615,0.513,0.301,0.305,10
F,0.56,0.44,0.14,0.9285,0.3825,0.188,0.3,11
F,0.58,0.45,0.185,0.9955,0.3945,0.272,0.285,11
M,0.59,0.445,0.14,0.931,0.356,0.234,0.28,12
M,0.605,0.475,0.18,0.9365,0.394,0.219,0.295,15
M,0.575,0.425,0.14,0.8635,0.393,0.227,0.2,11
M,0.58,0.47,0.165,0.9975,0.3935,0.242,0.33,10
F,0.68,0.56,0.165,1.639,0.6055,0.2805,0.46,15
M,0.665,0.525,0.165,1.338,0.5515,0.3575,0.35,18
F,0.68,0.55,0.175,1.798,0.815,0.3925,0.455,19
F,0.705,0.55,0.2,1.7095,0.633,0.4115,0.49,13
M,0.465,0.355,0.105,0.4795,0.227,0.124,0.125,8
F,0.54,0.475,0.155,1.217,0.5305,0.3075,0.34,16
F,0.45,0.355,0.105,0.5225,0.237,0.1165,0.145,8
F,0.575,0.445,0.135,0.883,0.381,0.2035,0.26,11
M,0.355,0.29,0.09,0.3275,0.134,0.086,0.09,9
F,0.45,0.335,0.105,0.425,0.1865,0.091,0.115,9
F,0.55,0.425,0.135,0.8515,0.362,0.196,0.27,14
I,0.24,0.175,0.045,0.07,0.0315,0.0235,0.02,5
I,0.205,0.15,0.055,0.042,0.0255,0.015,0.012,5
I,0.21,0.15,0.05,0.042,0.0175,0.0125,0.015,4
I,0.39,0.295,0.095,0.203,0.0875,0.045,0.075,7
M,0.47,0.37,0.12,0.5795,0.293,0.227,0.14,9
F,0.46,0.375,0.12,0.4605,0.1775,0.11,0.15,7
I,0.325,0.245,0.07,0.161,0.0755,0.0255,0.045,6
F,0.525,0.425,0.16,0.8355,0.3545,0.2135,0.245,9
I,0.52,0.41,0.12,0.595,0.2385,0.111,0.19,8
M,0.4,0.32,0.095,0.303,0.1335,0.06,0.1,7
M,0.485,0.36,0.13,0.5415,0.2595,0.096,0.16,10
F,0.47,0.36,0.12,0.4775,0.2105,0.1055,0.15,10
M,0.405,0.31,0.1,0.385,0.173,0.0915,0.11,7
F,0.5,0.4,0.14,0.6615,0.2565,0.1755,0.22,8
M,0.445,0.35,0.12,0.4425,0.192,0.0955,0.135,8
M,0.47,0.385,0.135,0.5895,0.2765,0.12,0.17,8
I,0.245,0.19,0.06,0.086,0.042,0.014,0.025,4
F,0.505,0.4,0.125,0.583,0.246,0.13,0.175,7
M,0.45,0.345,0.105,0.4115,0.18,0.1125,0.135,7
M,0.505,0.405,0.11,0.625,0.305,0.16,0.175,9
F,0.53,0.41,0.13,0.6965,0.302,0.1935,0.2,10
M,0.425,0.325,0.095,0.3785,0.1705,0.08,0.1,7
M,0.52,0.4,0.12,0.58,0.234,0.1315,0.185,8
M,0.475,0.355,0.12,0.48,0.234,0.1015,0.135,8
F,0.565,0.44,0.16,0.915,0.354,0.1935,0.32,12
F,0.595,0.495,0.185,1.285,0.416,0.224,0.485,13
F,0.475,0.39,0.12,0.5305,0.2135,0.1155,0.17,10
I,0.31,0.235,0.07,0.151,0.063,0.0405,0.045,6
M,0.555,0.425,0.13,0.7665,0.264,0.168,0.275,13
F,0.4,0.32,0.11,0.353,0.1405,0.0985,0.1,8
F,0.595,0.475,0.17,1.247,0.48,0.225,0.425,20
M,0.57,0.48,0.175,1.185,0.474,0.261,0.38,11
F,0.605,0.45,0.195,1.098,0.481,0.2895,0.315,13
F,0.6,0.475,0.15,1.0075,0.4425,0.221,0.28,15
M,0.595,0.475,0.14,0.944,0.3625,0.189,0.315,9
F,0.6,0.47,0.15,0.922,0.363,0.194,0.305,10
F,0.555,0.425,0.14,0.788,0.282,0.1595,0.285,11
F,0.615,0.475,0.17,1.1025,0.4695,0.2355,0.345,14
F,0.575,0.445,0.14,0.941,0.3845,0.252,0.285,9
M,0.62,0.51,0.175,1.615,0.5105,0.192,0.675,12
F,0.52,0.425,0.165,0.9885,0.396,0.225,0.32,16
M,0.595,0.475,0.16,1.3175,0.408,0.234,0.58,21
M,0.58,0.45,0.14,1.013,0.38,0.216,0.36,14
F,0.57,0.465,0.18,1.295,0.339,0.2225,0.44,12
M,0.625,0.465,0.14,1.195,0.4825,0.205,0.4,13
M,0.56,0.44,0.16,0.8645,0.3305,0.2075,0.26,10
F,0.46,0.355,0.13,0.517,0.2205,0.114,0.165,9
F,0.575,0.45,0.16,0.9775,0.3135,0.231,0.33,12
M,0.565,0.425,0.135,0.8115,0.341,0.1675,0.255,15
M,0.555,0.44,0.15,0.755,0.307,0.1525,0.26,12
M,0.595,0.465,0.175,1.115,0.4015,0.254,0.39,13
F,0.625,0.495,0.165,1.262,0.507,0.318,0.39,10
M,0.695,0.56,0.19,1.494,0.588,0.3425,0.485,15
M,0.665,0.535,0.195,1.606,0.5755,0.388,0.48,14
M,0.535,0.435,0.15,0.725,0.269,0.1385,0.25,9
M,0.47,0.375,0.13,0.523,0.214,0.132,0.145,8
M,0.47,0.37,0.13,0.5225,0.201,0.133,0.165,7
F,0.475,0.375,0.125,0.5785,0.2775,0.085,0.155,10
I,0.36,0.265,0.095,0.2315,0.105,0.046,0.075,7
M,0.55,0.435,0.145,0.843,0.328,0.1915,0.255,15
M,0.53,0.435,0.16,0.883,0.316,0.164,0.335,15
M,0.53,0.415,0.14,0.724,0.3105,0.1675,0.205,10
M,0.605,0.47,0.16,1.1735,0.4975,0.2405,0.345,12
F,0.52,0.41,0.155,0.727,0.291,0.1835,0.235,12
F,0.545,0.43,0.165,0.802,0.2935,0.183,0.28,11
F,0.5,0.4,0.125,0.6675,0.261,0.1315,0.22,10
F,0.51,0.39,0.135,0.6335,0.231,0.179,0.2,9
F,0.435,0.395,0.105,0.3635,0.136,0.098,0.13,9
M,0.495,0.395,0.125,0.5415,0.2375,0.1345,0.155,9
M,0.465,0.36,0.105,0.431,0.172,0.107,0.175,9
I,0.435,0.32,0.08,0.3325,0.1485,0.0635,0.105,9
M,0.425,0.35,0.105,0.393,0.13,0.063,0.165,9
F,0.545,0.41,0.125,0.6935,0.2975,0.146,0.21,11
F,0.53,0.415,0.115,0.5915,0.233,0.1585,0.18,11
F,0.49,0.375,0.135,0.6125,0.2555,0.102,0.22,11
M,0.44,0.34,0.105,0.402,0.1305,0.0955,0.165,10
F,0.56,0.43,0.15,0.8825,0.3465,0.172,0.31,9
M,0.405,0.305,0.085,0.2605,0.1145,0.0595,0.085,8
F,0.47,0.365,0.105,0.4205,0.163,0.1035,0.14,9
I,0.385,0.295,0.085,0.2535,0.103,0.0575,0.085,7
F,0.515,0.425,0.14,0.766,0.304,0.1725,0.255,14
M,0.37,0.265,0.075,0.214,0.09,0.051,0.07,6
I,0.36,0.28,0.08,0.1755,0.081,0.0505,0.07,6
I,0.27,0.195,0.06,0.073,0.0285,0.0235,0.03,5
I,0.375,0.275,0.09,0.238,0.1075,0.0545,0.07,6
I,0.385,0.29,0.085,0.2505,0.112,0.061,0.08,8