机器学习:KERAS对乳腺癌的分类准确率为98.18%

机器学习:KERAS对乳腺癌的分类准确率为98.18%

数据集

可在此处找到数据链接(https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/)。

导入Python库

import numpy as np

from sklearn import preprocessing, cross_validation

import pandas as pd

读取数据

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data')

Reshaping

将特征列添加到dataframe,Python代码如下:

df.columns = ['id','clump_thickness','unif_cell_size','unif_cell_shape','marg_adhesion','single_epith_size','bare_nuclei','bland_chrom','norm_nucleoli','mitoses','class']

删除id列因为与类没有相关性

df.drop(['id'], inplace=True, axis=1)

用-99999替换空数据为异常值

df.replace('?', -99999, inplace=True)

将类值映射到二进制,在我们的数据中它是2和4。(2为良性,4为恶性)

df['class'] = df['class'].map(lambda x: 1 if x == 4 else 0)

最终的dataframe

机器学习:KERAS对乳腺癌的分类准确率为98.18%

缩放数据

创建X(特征)和y(类)

X = np.array(df.drop(['class'], axis=1))

y = np.array(df['class'])

创建scaler 实例

scaler = preprocessing.MinMaxScaler()

最后缩放数据

X = scaler.fit_transform(X)

拆分数据

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

创建机器学习模型和训练

导入Python库

from __future__ import print_function

import keras

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten, Activation

import tensorflow as tf

创建机器学习模型

创建机器学习模型实例,Python代码如下:

model = Sequential()

将层添加到机器学习模型中,Python代码如下:

model.add(Dense(9, activation='sigmoid', input_shape=(9,)))

model.add(Dense(27, activation='sigmoid'))

model.add(Dropout(0.25))

model.add(Dense(54, activation='sigmoid'))

model.add(Dropout(0.25))

model.add(Dense(27, activation='sigmoid'))

model.add(Dropout(0.25))

model.add(Dense(1, activation='sigmoid'))

编译模型

model.compile(optimizer=keras.optimizers.Adam(), loss=keras.losses.mean_squared_logarithmic_error)

我用Adam作为优化器,对数均方误差作为损失函数。

训练机器学习模型

model.fit(X_train, y_train, batch_size=30, epochs=2000, verbose=1, validation_data=(X_test, y_test))

Output:

Epoch 2000/2000

558/558 [==============================] - 0s 320us/step - loss: 0.0104 - val_loss: 0.0182

评估结果

loss = model.evaluate(X_test, y_test, verbose=1, batch_size=30)

print("Final result is {}".format(100 - loss*100))

Output:

Final result is 98.18395614690546

最终结果是98.18%

相关推荐