ПРИМЕНЕНИЕ RANDOM FOREST (КЛАССИФИКАЦИЯ) — АЛГОРИТМ МАШИННОГО ОБУЧЕНИЯ С НУЛЯ С РЕАЛЬНЫМИ НАБОРАМИ ДАННЫХ
1. Understanding the datasets
Информация о наборе данных:
Набор данных содержит случаи из исследования, которое проводилось в период с 1958 по 1970 год в больнице Биллингс Чикагского университета о выживании пациентов, перенесших операцию по поводу рака молочной железы.
Информация об атрибутах:
X1 — Возраст пациента на момент операции (числовой) X2 — Год операции пациента (год — 1900, числовой) X3 — Количество выявленных положительных подмышечных лимфоузлов (числовой) Y — Статус выживания (признак класса) — 1 = пациент выжил 5 лет и более — 2 = пациент умер в течение 5 лет
In [ ]:
2. Importing Datasets
In [1]:
import numpy as np import pandas as pd df = pd.read_csv("survival.csv") print(df) X1 X2 X3 Y 0 30 64 1 1 1 30 62 3 1 2 30 65 0 1 3 31 59 2 1 4 31 65 4 1 5 33 58 10 1 6 33 60 0 1 7 34 59 0 2 8 34 66 9 2 9 34 58 30 1 10 34 60 1 1 11 34 61 10 1 12 34 67 7 1 13 34 60 0 1 14 35 64 13 1 15 35 63 0 1 16 36 60 1 1 17 36 69 0 1 18 37 60 0 1 19 37 63 0 1 20 37 58 0 1 21 37 59 6 1 22 37 60 15 1 23 37 63 0 1 24 38 69 21 2 25 38 59 2 1 26 38 60 0 1 27 38 60 0 1 28 38 62 3 1 29 38 64 1 1 .. .. .. .. .. 276 67 66 0 1 277 67 61 0 1 278 67 65 0 1 279 68 67 0 1 280 68 68 0 1 281 69 67 8 2 282 69 60 0 1 283 69 65 0 1 284 69 66 0 1 285 70 58 0 2 286 70 58 4 2 287 70 66 14 1 288 70 67 0 1 289 70 68 0 1 290 70 59 8 1 291 70 63 0 1 292 71 68 2 1 293 72 63 0 2 294 72 58 0 1 295 72 64 0 1 296 72 67 3 1 297 73 62 0 1 298 73 68 0 1 299 74 65 3 2 300 74 63 0 1 301 75 62 1 1 302 76 67 0 1 303 77 65 3 1 304 78 65 1 2 305 83 58 2 2 [306 rows x 4 columns]
In [ ]:
3. Splitting datas for training
In [2]:
X_train = df[['X1', 'X2', 'X3' ]][:306].values.reshape(306, 3) y_train = df[['Y']][:306].values.reshape(306, 1)
In [3]:
print("Training data - Input") print(X_train) print("\n\nTraining data - Output") print(y_train) Training data - Input [[30 64 1] [30 62 3] [30 65 0] [31 59 2] [31 65 4] [33 58 10] [33 60 0] [34 59 0] [34 66 9] [34 58 30] [34 60 1] [34 61 10] [34 67 7] [34 60 0] [35 64 13] [35 63 0] [36 60 1] [36 69 0] [37 60 0] [37 63 0] [37 58 0] [37 59 6] [37 60 15] [37 63 0] [38 69 21] [38 59 2] [38 60 0] [38 60 0] [38 62 3] [38 64 1] [38 66 0] [38 66 11] [38 60 1] [38 67 5] [39 66 0] [39 63 0] [39 67 0] [39 58 0] [39 59 2] [39 63 4] [40 58 2] [40 58 0] [40 65 0] [41 60 23] [41 64 0] [41 67 0] [41 58 0] [41 59 8] [41 59 0] [41 64 0] [41 69 8] [41 65 0] [41 65 0] [42 69 1] [42 59 0] [42 58 0] [42 60 1] [42 59 2] [42 61 4] [42 62 20] [42 65 0] [42 63 1] [43 58 52] [43 59 2] [43 64 0] [43 64 0] [43 63 14] [43 64 2] [43 64 3] [43 60 0] [43 63 2] [43 65 0] [43 66 4] [44 64 6] [44 58 9] [44 63 19] [44 61 0] [44 63 1] [44 61 0] [44 67 16] [45 65 6] [45 66 0] [45 67 1] [45 60 0] [45 67 0] [45 59 14] [45 64 0] [45 68 0] [45 67 1] [46 58 2] [46 69 3] [46 62 5] [46 65 20] [46 62 0] [46 58 3] [46 63 0] [47 63 23] [47 62 0] [47 65 0] [47 61 0] [47 63 6] [47 66 0] [47 67 0] [47 58 3] [47 60 4] [47 68 4] [47 66 12] [48 58 11] [48 58 11] [48 67 7] [48 61 8] [48 62 2] [48 64 0] [48 66 0] [49 63 0] [49 64 10] [49 61 1] [49 62 0] [49 66 0] [49 60 1] [49 62 1] [49 63 3] [49 61 0] [49 67 1] [50 63 13] [50 64 0] [50 59 0] [50 61 6] [50 61 0] [50 63 1] [50 58 1] [50 59 2] [50 61 0] [50 64 0] [50 65 4] [50 66 1] [51 59 13] [51 59 3] [51 64 7] [51 59 1] [51 65 0] [51 66 1] [52 69 3] [52 59 2] [52 62 3] [52 66 4] [52 61 0] [52 63 4] [52 69 0] [52 60 4] [52 60 5] [52 62 0] [52 62 1] [52 64 0] [52 65 0] [52 68 0] [53 58 4] [53 65 1] [53 59 3] [53 60 9] [53 63 24] [53 65 12] [53 58 1] [53 60 1] [53 60 2] [53 61 1] [53 63 0] [54 60 11] [54 65 23] [54 65 5] [54 68 7] [54 59 7] [54 60 3] [54 66 0] [54 67 46] [54 62 0] [54 69 7] [54 63 19] [54 58 1] [54 62 0] [55 63 6] [55 68 15] [55 58 1] [55 58 0] [55 58 1] [55 66 18] [55 66 0] [55 69 3] [55 69 22] [55 67 1] [56 65 9] [56 66 3] [56 60 0] [56 66 2] [56 66 1] [56 67 0] [56 60 0] [57 61 5] [57 62 14] [57 64 1] [57 64 9] [57 69 0] [57 61 0] [57 62 0] [57 63 0] [57 64 0] [57 64 0] [57 67 0] [58 59 0] [58 60 3] [58 61 1] [58 67 0] [58 58 0] [58 58 3] [58 61 2] [59 62 35] [59 60 0] [59 63 0] [59 64 1] [59 64 4] [59 64 0] [59 64 7] [59 67 3] [60 59 17] [60 65 0] [60 61 1] [60 67 2] [60 61 25] [60 64 0] [61 62 5] [61 65 0] [61 68 1] [61 59 0] [61 59 0] [61 64 0] [61 65 8] [61 68 0] [61 59 0] [62 59 13] [62 58 0] [62 65 19] [62 62 6] [62 66 0] [62 66 0] [62 58 0] [63 60 1] [63 61 0] [63 62 0] [63 63 0] [63 63 0] [63 66 0] [63 61 9] [63 61 28] [64 58 0] [64 65 22] [64 66 0] [64 61 0] [64 68 0] [65 58 0] [65 61 2] [65 62 22] [65 66 15] [65 58 0] [65 64 0] [65 67 0] [65 59 2] [65 64 0] [65 67 1] [66 58 0] [66 61 13] [66 58 0] [66 58 1] [66 68 0] [67 64 8] [67 63 1] [67 66 0] [67 66 0] [67 61 0] [67 65 0] [68 67 0] [68 68 0] [69 67 8] [69 60 0] [69 65 0] [69 66 0] [70 58 0] [70 58 4] [70 66 14] [70 67 0] [70 68 0] [70 59 8] [70 63 0] [71 68 2] [72 63 0] [72 58 0] [72 64 0] [72 67 3] [73 62 0] [73 68 0] [74 65 3] [74 63 0] [75 62 1] [76 67 0] [77 65 3] [78 65 1] [83 58 2]] Training data - Output [[1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [1] [1] [2] [2] [2] [2] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [2] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [2] [2] [2] [1] [1] [1] [1] [1] [2] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [1] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [1] [1] [2] [2] [2] [1] [1] [1] [1] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [2] [1] [1] [1] [1] [1] [1] [2] [2] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [2] [1] [1] [1] [2] [2] [1] [1] [1] [1] [1] [1] [2] [1] [1] [1] [1] [1] [2] [1] [1] [1] [1] [2] [2]]
In [ ]:
4. Implementing RANDOM FOREST CLASSIFICATION
In [4]:
from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier()
In [ ]:
5. Fitting the datasets
In [5]:
clf.fit(X_train,y_train.ravel())
Выход[5]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False)
In [ ]:
6. Predicitng Sample Data
In [6]:
print(clf.predict([[83,58,2]])) [2]
In [ ]:
7. Predicting the data for the trained data
In [7]:
# This will help evaluation of the result y_pred= clf.predict(X_train) print(y_pred) [1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 2]
In [ ]:
8. Report Generation
In [8]:
from sklearn.metrics import classification_report report = classification_report(y_train, y_pred) print(report) precision recall f1-score support 1 0.97 0.98 0.98 225 2 0.95 0.91 0.93 81 avg / total 0.96 0.96 0.96 306