在神经网络中实现反向传播

cetrolchen

2018-10-07

关注关注

在构建神经网络时，需要采取几个步骤。最重要的两个步骤是实现正向和反向传播。在本教程中，我们将重点关注反向传播及其每一步背后的直觉。

什么是反向传播？

这是一种简单的实现神经网络的技术，它允许我们计算参数的梯度，以执行梯度下降和最小化我们的成本函数。我们将在本教程中完全理解反向传播的每个部分。

在神经网络中实现反向传播

实现反向传播

假设一个简单的两层神经网络 - 一个隐藏层和一个输出层。我们可以如下进行反向传播

初始化用于神经网络的权重和偏差：这涉及随机初始化神经网络的权重和偏差。这些参数的梯度将从反向传播中获得并用于更新梯度下降。

Python代码如下：

#Import Numpy library
import numpy as np
#set seed for reproducability 
np.random.seed(100)
#We will first initialize the weights and bias needed and store them in a dictionary called W_B
def initialize(num_f, num_h, num_out):
 
 '''
 Description: This function randomly initializes the weights and biases of each layer of the neural network
 
 Input Arguments:
 num_f - number of training features
 num_h -the number of nodes in the hidden layers
 num_out - the number of nodes in the output 
 
 Output: 
 
 W_B - A dictionary of the initialized parameters.
 
 '''
 
 #randomly initialize weights and biases, and proceed to store in a dictionary
 W_B = {
 'W1': np.random.randn(num_h, num_f),
 'b1': np.zeros((num_h, 1)),
 'W2': np.random.randn(num_out, num_h),
 'b2': np.zeros((num_out, 1))
 }
 return W_B

在神经网络中实现反向传播

执行正向传播：这涉及计算隐藏层和输出层的线性和激活输出。

对于隐藏层：

我们将使用relu激活函数，Python代码如下所示：

#We will now proceed to create functions for each of our activation functions
def relu (Z):
 
 '''
 Description: This function performs the relu activation function on a given number or matrix. 
 
 Input Arguments:
 Z - matrix or integer
 
 Output: 
 
 relu_Z - matrix or integer with relu performed on it
 
 '''
 relu_Z = np.maximum(Z,0)
 
 return relu_Z

在神经网络中实现反向传播

对于输出层：

我们将使用sigmoid激活函数，Python实现如下所示：

def sigmoid (Z):
 
 '''
 Description: This function performs the sigmoid activation function on a given number or matrix. 
 
 Input Arguments:
 Z - matrix or integer
 
 Output: 
 
 sigmoid_Z - matrix or integer with sigmoid performed on it
 
 '''
 sigmoid_Z = 1 / (1 + (np.exp(-Z)))
 
 return sigmoid_Z

在神经网络中实现反向传播

执行正向传播，Python实现如下：

#We will now proceed to perform forward propagation
def forward_propagation(X, W_B): 
 '''
 Description: This function performs the forward propagation in a vectorized form 
 
 Input Arguments:
 X - input training examples
 W_B - initialized weights and biases
 
 Output: 
 
 forward_results - A dictionary containing the linear and activation outputs
 
 '''
 
 #Calculate the linear Z for the hidden layer
 Z1 = np.dot(X, W_B['W1'].T) + W_B['b1']
 
 #Calculate the activation ouput for the hidden layer
 A = relu(Z1)
 
 #Calculate the linear Z for the output layer
 Z2 = np.dot(A, W_B['W2'].T) + W_B['b2']
 
 #Calculate the activation ouput for the ouptu layer
 Y_pred = sigmoid(Z2) 
 
 #Save all ina dictionary 
 forward_results = {"Z1": Z1,
 "A": A,
 "Z2": Z2,
 "Y_pred": Y_pred}
 
 return forward_results

在神经网络中实现反向传播

执行反向传播：

计算成本相对于梯度下降相关参数的梯度。在本例中，dLdZ2、dLdW2、dLdb2、dLdZ1、dLdW1和dLdb1。这些参数将与学习率结合起来进行梯度下降。

逐步指南如下:

从正向传播中获取结果，如下所示：

forward_results = forward_propagation(X, W_B)
Z1 = forward_results['Z1']
A = forward_results['A']
Z2 = forward_results['Z2']
Y_pred = forward_results['Y_pred']

获取训练样本的数量，如下所示：

no_examples = X.shape[1]

计算函数损失：

L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))

计算每个参数的梯度，如下所示：

dLdZ2= Y_pred - Y_true
dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)

将梯度下降所需的计算梯度存储在字典中：

gradients = {"dLdW1": dLdW1,
 "dLdb1": dLdb1,
 "dLdW2": dLdW2,
 "dLdb2": dLdb2}

返回损失和存储的梯度：

return gradients, L

这是完整的反向传播函数：

Python代码如下：

def backward_propagation(X, W_B, Y_true):
 '''Description: This function performs the backward propagation in a vectorized form 
 
 Input Arguments:
 X - input training examples
 W_B - initialized weights and biases
 Y_True - the true target values of the training examples
 
 Output: 
 
 gradients - the calculated gradients of each parameter
 L - the loss function
 
 '''
 
 # Obtain the forward results from the forward propagation 
 
 forward_results = forward_propagation(X, W_B)
 Z1 = forward_results['Z1']
 A = forward_results['A']
 Z2 = forward_results['Z2']
 Y_pred = forward_results['Y_pred']
 
 #Obtain the number of training samples 
 no_examples = X.shape[1]
 
 # Calculate loss 
 L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))
 
 #Calculate the gradients of each parameter needed for gradient descent 
 dLdZ2= Y_pred - Y_true
 dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
 dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
 dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
 dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
 dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)
 
 #Store gradients for gradient descent in a dictionary 
 gradients = {"dLdW1": dLdW1,
 "dLdb1": dLdb1,
 "dLdW2": dLdW2,
 "dLdb2": dLdb2}
 
 return gradients, L

在神经网络中实现反向传播

许多人总是认为反向传播很困难，但正如您在本教程中看到的那样，事实并非如此。理解每一步对于掌握整个反向传播技术是必不可少的。另外，要掌握数学 - 线性代数和微积分 - 才能理解每个函数的各个梯度是如何计算的。实际上，反向传播通常由您正在使用的深度学习框架为您处理。但是，理解这种技术的内部运作是有益的，因为它有时可以帮助您理解为什么您的神经网络可能无法很好地训练。

神经网络 python神经网络 num relu

安科网

在神经网络中实现反向传播

cetrolchen

什么是反向传播？

实现反向传播

这是完整的反向传播函数：

cetrolchen

相关推荐

用C语言从头开始实现一个神经网络

深度学习(一）:Python神经网络——手写数字识别

神经网络原来这么简单，机器学习入门贴送给你 | 干货

12个写论文必备的神经网络可视化工具

对迁移学习中域适应的理解和3种技术的介绍

仅有算法远远不够：AI突破下一站，需要对硬件重新审视

选择困难终结者：不同问题之下的机器学习算法

自动驾驶汽车深度学习如何应对挑战?

破解宇宙奥秘，为何人工智能是关键？

如何在PyTorch和TensorFlow中训练图像分类模型

回归根基：5篇必读的数据科学论文，帮你保持领先地位

全面的数据科学C/C++机器学习库整理，再也不用百度搜了

DeepMind发了篇物理论文，用神经网络求解薛定谔方程

4个计算机视觉领域用作迁移学习的模型

机器翻译：谷歌翻译是如何对几乎所有语言进行翻译的？

算法中的微积分：5大函数求导公式让你在面试中脱颖而出

科学家用机器人体外操控了小鼠脑神经！不到 1 分钟实现通信连接

深度学习之后会是啥？

标准出现问题，人工智能正在走向错误的方向

自然语言处理必读：5本平衡理论与实践的书籍

cetrolchen