TensorFlow学习笔记（4）：基于MNIST数据的softmax regression

WSNjiang

2019-06-21

前言

本文基于TensorFlow官网的Tutorial写成。输入数据是MNIST，全称是Modified National Institute of Standards and Technology，是一组由这个机构搜集的手写数字扫描文件和每个文件对应标签的数据集，经过一定的修改使其适合机器学习算法读取。这个数据集可以从牛的不行的Yann LeCun教授的网站获取。

本文首先使用sklearn的LogisticRegression()进行训练，得到的参数绘制效果如下（红色表示参数估计结果为负，蓝色表示参数估计结果为正，绿色代表参数估计结果为零）：

TensorFlow学习笔记（4）：基于MNIST数据的softmax regression

从图形效果看，我们发现蓝色点组成的轮廓与对应的数字轮廓还是比较接近的。

然后本文使用tensorflow对同样的数据集进行了softmax regression的训练，得到的参数绘制效果如下：

TensorFlow学习笔记（4）：基于MNIST数据的softmax regression

蓝色点组成的轮廓与对应的数字轮廓比较接近。但是对比上下两幅截图，感觉tensorflow的效果更平滑一些。不过从测试集的准确率来看，二者都在92%左右，sklearn稍微好一点。注意，92%的准确率看起来不错，但其实是一个很低的准确率，按照官网教程的说法，应该要感到羞愧。

代码

#!/usr/bin/env python
# -*- coding=utf-8 -*-
# @author: 陈水平
# @date: 2017-01-10
# @description: implement a softmax regression model upon MNIST handwritten digits
# @ref: http://yann.lecun.com/exdb/mnist/

import gzip
import struct
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
import tensorflow as tf

# MNIST data is stored in binary format, 
# and we transform them into numpy ndarray objects by the following two utility functions
def read_image(file_name):
    with gzip.open(file_name, 'rb') as f:
        buf = f.read()
        index = 0
        magic, images, rows, columns = struct.unpack_from('>IIII' , buf , index)
        index += struct.calcsize('>IIII')

        image_size = '>' + str(images*rows*columns) + 'B'
        ims = struct.unpack_from(image_size, buf, index)
        
        im_array = np.array(ims).reshape(images, rows, columns)
        return im_array

def read_label(file_name):
    with gzip.open(file_name, 'rb') as f:
        buf = f.read()
        index = 0
        magic, labels = struct.unpack_from('>II', buf, index)
        index += struct.calcsize('>II')
        
        label_size = '>' + str(labels) + 'B'
        labels = struct.unpack_from(label_size, buf, index)

        label_array = np.array(labels)
        return label_array

print "Start processing MNIST handwritten digits data..."
train_x_data = read_image("MNIST_data/train-images-idx3-ubyte.gz")
train_x_data = train_x_data.reshape(train_x_data.shape[0], -1).astype(np.float32)
train_y_data = read_label("MNIST_data/train-labels-idx1-ubyte.gz")
test_x_data = read_image("MNIST_data/t10k-images-idx3-ubyte.gz")
test_x_data = test_x_data.reshape(test_x_data.shape[0], -1).astype(np.float32)
test_y_data = read_label("MNIST_data/t10k-labels-idx1-ubyte.gz")

train_x_minmax = train_x_data / 255.0
test_x_minmax = test_x_data / 255.0

# Of course you can also use the utility function to read in MNIST provided by tensorflow
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
# train_x_minmax = mnist.train.images
# train_y_data = mnist.train.labels
# test_x_minmax = mnist.test.images
# test_y_data = mnist.test.labels

# We evaluate the softmax regression model by sklearn first
eval_sklearn = False
if eval_sklearn:
    print "Start evaluating softmax regression model by sklearn..."
    reg = LogisticRegression(solver="lbfgs", multi_class="multinomial")
    reg.fit(train_x_minmax, train_y_data)
    np.savetxt('coef_softmax_sklearn.txt', reg.coef_, fmt='%.6f')  # Save coefficients to a text file
    test_y_predict = reg.predict(test_x_minmax)
    print "Accuracy of test set: %f" % accuracy_score(test_y_data, test_y_predict)

eval_tensorflow = True
batch_gradient = False
if eval_tensorflow:
    print "Start evaluating softmax regression model by tensorflow..."
    # reformat y into one-hot encoding style
    lb = preprocessing.LabelBinarizer()
    lb.fit(train_y_data)
    train_y_data_trans = lb.transform(train_y_data)
    test_y_data_trans = lb.transform(test_y_data)

    x = tf.placeholder(tf.float32, [None, 784])
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    V = tf.matmul(x, W) + b
    y = tf.nn.softmax(V)

    y_ = tf.placeholder(tf.float32, [None, 10])

    loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
    optimizer = tf.train.GradientDescentOptimizer(0.5)
    train = optimizer.minimize(loss)

    init = tf.initialize_all_variables()

    sess = tf.Session()
    sess.run(init)

    if batch_gradient:
        for step in range(300):
            sess.run(train, feed_dict={x: train_x_minmax, y_: train_y_data_trans})
            if step % 10 == 0:
                print "Batch Gradient Descent processing step %d" % step
        print "Finally we got the estimated results, take such a long time..."
    else:
        for step in range(1000):
            sample_index = np.random.choice(train_x_minmax.shape[0], 100)
            batch_xs = train_x_minmax[sample_index, :]
            batch_ys = train_y_data_trans[sample_index, :]
            sess.run(train, feed_dict={x: batch_xs, y_: batch_ys})
            if step % 100 == 0:
                print "Stochastic Gradient Descent processing step %d" % step
    np.savetxt('coef_softmax_tf.txt', np.transpose(sess.run(W)), fmt='%.6f')  # Save coefficients to a text file
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print "Accuracy of test set: %f" % sess.run(accuracy, feed_dict={x: test_x_minmax, y_: test_y_data_trans})

输出如下：

Start processing MNIST handwritten digits data...
Start evaluating softmax regression model by sklearn...
Accuracy of test set: 0.926300
Start evaluating softmax regression model by tensorflow...
Stochastic Gradient Descent processing step 0
Stochastic Gradient Descent processing step 100
Stochastic Gradient Descent processing step 200
Stochastic Gradient Descent processing step 300
Stochastic Gradient Descent processing step 400
Stochastic Gradient Descent processing step 500
Stochastic Gradient Descent processing step 600
Stochastic Gradient Descent processing step 700
Stochastic Gradient Descent processing step 800
Stochastic Gradient Descent processing step 900
Accuracy of test set: 0.917400

思考

sklearn的估计时间有点长，因为每一轮参数更新都是基于全量的训练集数据算出损失，再算出梯度，然后再改进结果的。
tensorflow采用batch gradient descent估计算法时，时间也比较长，原因同上。
tensorflow采用stochastic gradient descent估计算法时间短，最后的估计结果也挺好，相当于每轮迭代只用到了部分数据集算出损失和梯度，速度变快，但可能bias增加；所以把迭代次数增多，这样可以降低variance，总体上的误差相比batch gradient descent并没有差多少。

附录

参数效果的绘图采用R实现，示例代码如下：

library(dplyr)
library(tidyr)
library(ggplot2)

t <- read.table("coef_softmax_tf.txt")

n <- t %>% 
  tibble::rownames_to_column("digit") %>%
  gather(var_name, var_value, -digit) %>%
  mutate(var_name=stringr::str_sub(var_name, 2))
n$var_name <- as.numeric(n$var_name)
n$digit <- as.numeric(n$digit)
n <- n %>% 
  mutate(digit=digit-1, var_name=var_name-1, y=28 - floor(var_name/28), x=var_name %% 28, v=ifelse(var_value>0, 1, ifelse(var_value<0, -1, 0)))

ggplot(n) + geom_point(aes(x=x,y=y,color=as.factor(v))) + facet_wrap(~digit)

mnist softmax tensorflow

安科网

TensorFlow学习笔记（4）：基于MNIST数据的softmax regression

WSNjiang

前言

代码

思考

附录

WSNjiang

相关推荐

如何在PyTorch和TensorFlow中训练图像分类模型

Ｍnist手写数字识别 Tensorflow

TensorFlow中超大的30个机器学习数据集

python中Keras下载mnist数据集

6.keras-基于CNN网络的Mnist数据集分类

Keras训练神经网络DEMO——全连接神经网络训练MNIST

你在打王者农药，有人却用iPhone来训练神经网络

体验中国自主知识产权天元深度学习引擎与TensorFlow,PyTorch的对比

python读取mnist文件

tensorflow 全连接神经网络识别mnist数据

tensorflow识别Mnist时，训练集与验证集精度acc高，但是测试集精度低的比较隐蔽的原因

[Pytorch数据集下载] 下载MNIST数据缓慢的方案

tensorflow 2.0 学习（十一）卷积神经网络（一）MNIST数据集训练与预测 LeNet-5网络

Ternsorflow 学习：006-MNIST进阶深入MNIST

tensorflow(五)

keras使用AutoEncoder对mnist数据降维

机器学习分享——手把手带你写一个GAN

机器学习 | CNN卷积神经网络

机器学习实战_分类（一）

Tensorflow快餐教程(10) - 循环神经网络

WSNjiang