使用Head API在TensorFlow中进行多任务学习

白话深度学习

2019-01-04

人类学习的一个基本特征是我们同时学习很多东西。机器学习中的等效概念称为多任务学习(multi-task learning, MTL)，在实践中越来越有用，特别是在强化学习和自然语言处理方面。事实上，即使在标准的单任务情况下，也可以设计额外的辅助任务，并将其包含在优化过程中，以帮助学习。

本文通过展示如何在图像分类基准中解决一个简单的多任务问题，对该领域进行了介绍。本文重点介绍TensorFlow（Head API）的一个实验性组件，它通过将神经网络的共享组件与特定于任务的组件解耦来帮助设计MTL的自定义估算器。在此过程中，我们还将有机会讨论TensorFlow核心的其他功能，包括tf.data，tf.image和自定义估算器。

简介

为了使教程更有趣，我们通过重新实现2014年论文的一部分（Facial Landmark Detection by Deep Multi-task Learning）来考虑一个现实的用例。问题很简单：给我们一个面部图像，我们需要定位一系列landmarks，即图像上的兴趣点（鼻子，左眼，嘴巴......）和标签（包括年龄和性别）。每个landmark/标签构成图像上的单独任务，并且任务明显相关（如左眼位置和右眼位置）

使用Head API在TensorFlow中进行多任务学习

来自数据集的示例图像

绿点是landmarks，每个图像还与一些其他标签相关联，包括年龄和性别。

我们将实现分为三个部分：（i）加载图像（使用tf.data和tf.image）; （ii）从文件中实施卷积网络（使用TF 的自定义估算器）; （iii）使用Head API添加MTL逻辑。

第0步 - 加载机器学习数据集

下载数据集（http://mmlab.ie.cuhk.edu.hk/projects/TCDCN/data/MTFL.zip）后，快速检查一下就会发现图像被分割到三个不同的文件夹中(AFLW、lfw_5590和net_7876)。训练和测试分割是通过不同的文本文件提供的，每行对应一个图像和标签的路径:

使用Head API在TensorFlow中进行多任务学习

来自训练数据集的第一个图像和标签。蓝色数字是图像位置（从左上角开始），红色数字是类别。

为简单起见，我们将使用Pandas加载文本文件，例如训练部分：

import pandas as pd
train_data = pd.read_csv('training.txt', sep=' ', header=None, skipinitialspace=True, nrows=10000)
train_data.iloc[:, 0] = train_data.iloc[:, 0].apply(lambda s: s.replace('\', '/'))

使用Head API在TensorFlow中进行多任务学习

由于文本文件不是很大，在这种情况下使用Pandas稍微容易一些并且提供了一点灵活性。但是，对于较大的文件，更好的选择是直接使用tf.data对象 TextLineDataset。

第1步 - 使用tf.data和Dataset对象

现在我们有了我们的数据，我们可以使用tf.data加载它。在最简单的情况下，我们可以通过Pandas的DataFrame切片来获取我们的批量数据：

# Load filenames and labels
filenames = tf.constant(train_data.iloc[:, 0].tolist())
labels = tf.constant(train_data.iloc[:, 1:].values)
# Add to a dataset object
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

使用Head API在TensorFlow中进行多任务学习

以前，将tf.data与Estimators一起使用的一个主要问题是调试数据集相当复杂，必须通过tf.Session对象。但是，从最新版本开始，即使在使用估算器时，也可以通过eager execution enabled,来调试数据集。例如，我们可以使用机器学习数据集构建8个元素的批次，获取第一批，并在屏幕上打印所有内容：

# We can debug using eager execution
for img, labels in dataset.batch(4).take(1):
 print(img)
 print(labels)
 
# tf.Tensor(
# [b'lfw_5590/Aaron_Eckhart_0001.jpg' b'lfw_5590/Aaron_Guiel_0001.jpg' ...
# 2. 3. ]], shape=(4, 14), dtype=float64)

使用Head API在TensorFlow中进行多任务学习

现在是从路径开始加载图像的时候了！请注意，通常这不是一件容易的事，因为图像可以有许多不同的扩展名，大小，有些是黑白的，等等。幸运的是，我们可以从TF教程(https://www.tensorflow.org/guide/datasets#preprocessing_data_with_datasetmap)中获取灵感来构建一个简单的函数来封装所有这些逻辑，利用tf.image模块中的工具：

# Reads an image from a file, decodes it into a dense tensor, and resizes it
# to a fixed shape.
def _parse_function(filename, label):
 image_string = tf.read_file(filename) 
 image_decoded = tf.image.decode_jpeg(image_string, channels=3) # Channels needed because some test images are b/w
 image_resized = tf.image.resize_images(image_decoded, [40, 40])
 image_shape = tf.cast(tf.shape(image_decoded), tf.float32)
 label = tf.concat([label[0:5] / image_shape[0], label[5:10] / image_shape[1], label[10:]], axis=0)
 return {"x": image_resized}, label

使用Head API在TensorFlow中进行多任务学习

该函数负责解决大多数解析问题：

'channels'参数允许在一行中加载彩色和黑白图像;
我们将所有图像调整为所需格式（40x40，根据原始文件）;
在第8行，我们还归一化我们的landmark标签，以表示0和1之间的相对位置，而不是绝对的位置（因为我们调整了所有图像的大小，图像可能会有不同的形状）。

我们可以使用其内部的“map”函数将解析函数应用于机器学习数据集的每个元素：将它与一些额外的逻辑一起用于训练/测试，我们获得最终的加载函数：

# This snippet is adapted from here: https://www.tensorflow.org/guide/datasets
def input_fn(dataframe, is_eval=False):
 # Load the list of files
 filenames = tf.constant(dataframe.iloc[:, 0].tolist())
 # Load the labels
 labels = tf.constant(dataframe.iloc[:, 1:].values.astype(np.float32))
 # Build the dataset with image processing on top of it
 dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
 dataset = dataset.map(_parse_function)
 # Add shuffling and repeatition if training
 if is_eval:
 dataset = dataset.batch(64)
 else:
 dataset = dataset.repeat().shuffle(1000).batch(64)
 
 return dataset

使用Head API在TensorFlow中进行多任务学习

从数据集成功加载的单个图像

第2步 - 使用自定义估算器构建卷积网络

下一步，我们想要复制原始论文中的卷积神经网络（CNN）：

使用Head API在TensorFlow中进行多任务学习

卷积神经网络（CNN）的逻辑由两部分组成：第一部分是整个图像的通用特征提取器（在所有任务中共享），而对于每个任务，我们有一个单独的，较小的模型作用于最终的特征嵌入图片。由于以下原因，我们将这些简单模型中的每一个称为“head”。通过梯度下降同时训练所有heads。

让我们从特征提取部分开始。为此，我们利用tf.layers对象构建我们的主网络：

# Reimplement the feature extraction from the original paper
def extract_features(features):
 # Input layer
 input_layer = tf.reshape(features["x"], [-1, 40, 40, 3])
 # First convolutive layer
 conv1 = tf.layers.conv2d(inputs=input_layer, filters=16, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
 pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
 # Second convolutive layer
 conv2 = tf.layers.conv2d(inputs=pool1, filters=48, kernel_size=[3, 3], padding="same", activation=tf.nn.relu)
 pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
 
 # Third convolutive layer
 conv3 = tf.layers.conv2d(inputs=pool2, filters=64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu)
 pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[2, 2], strides=2)
 
 # Fourth convolutive layer
 conv4 = tf.layers.conv2d(inputs=pool3, filters=64, kernel_size=[2, 2], padding="same", activation=tf.nn.relu)
 
 # Dense Layer
 flat = tf.reshape(conv4, [-1, 5 * 5 * 64])
 dense = tf.layers.dense(inputs=flat, units=100, activation=tf.nn.relu)
 
 return dense

使用Head API在TensorFlow中进行多任务学习

目前，我们将专注于单个head/task，即估计图像中的鼻子位置。一种方法是使用自定义估算器，允许将我们自己的模型实现与标准Estimator对象的所有功能相结合。

自定义估算器的一个缺点是它们的代码往往非常“冗长”，因为我们需要将估算器的整个逻辑（训练，评估和预测）封装到一个函数中：

# Adapted from here: https://www.tensorflow.org/tutorials/layers
def single_task_cnn_model_fn(features, labels, mode):
 
 # Get features
 dense = extract_features(features)
 
 # Make predictions
 predictions = tf.layers.dense(inputs=dense, units=2)
 outputs = {
 "predictions": predictions
 }
 # We just want the predictions
 if mode == tf.estimator.ModeKeys.PREDICT:
 return tf.estimator.EstimatorSpec(mode=mode, predictions=outputs)
 # If not in mode.PREDICT, compute the loss (mean squared error)
 loss = tf.losses.mean_squared_error(labels=labels[:, 2:8:5], predictions=predictions)
 # Single optimization step
 if mode == tf.estimator.ModeKeys.TRAIN:
 optimizer = tf.train.AdamOptimizer()
 train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
 return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
 # If not PREDICT or TRAIN, then we are evaluating the model
 eval_metric_ops = {
 "rmse": tf.metrics.root_mean_squared_error(
 labels=labels[:, 2:8:5], predictions=outputs["predictions"])}
 return tf.estimator.EstimatorSpec(
 mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

使用Head API在TensorFlow中进行多任务学习

大致来说，模型函数接收到一个模式参数，我们可以使用这个参数来区分我们应该做什么样的操作(例如，训练)。模型函数通过另一个自定义对象EstimatorSpec与主估算器对象交换所有信息:

使用Head API在TensorFlow中进行多任务学习

自定义估算器的示意图

这不仅使代码难以阅读，而且上面的大多数代码都倾向于“样板”代码，这仅取决于我们面临的特定任务，例如，使用回归问题的均方误差。Head API是一个实验性功能，旨在简化在这种情况下编写代码，这是我们的下一个主题。

步骤3a - 使用Head API重写我们的自定义估算器

Head API的想法是，一旦指定了几个关键项，就可以自动生成主要预测组件（我们的模型函数）：特征提取部分，损失和我们的优化算法：

使用Head API在TensorFlow中进行多任务学习

从某种意义上说，这与Keras的高级界面类似，但它仍然具有足够的灵活性来定义一系列更有趣的heads，我们很快就会看到。

现在，让我们重写前面的代码，这次使用“regression head”：

def single_head_cnn_model_fn(features, labels, mode):
 
 # Extract the features
 dense = extract_features(features)
 
 # Predictions
 predictions = tf.layers.dense(inputs=dense, units=2)
 # Optimizer
 optimizer = tf.train.AdamOptimizer()
 
 # Define the head
 regression_head = tf.contrib.estimator.regression_head(label_dimension=2)
 return regression_head.create_estimator_spec(features, mode, predictions, labels[:, 2:8:5], optimizer)

使用Head API在TensorFlow中进行多任务学习

对于所有意图和目的，这两个模型是等效的，但后者更具可读性并且更不容易出错，因为大多数估算器特定的逻辑现在封装在head内部。我们可以使用估算器的“训练”界面训练两个模型中的任何一个，并开始得到我们的预测：

使用Head API在TensorFlow中进行多任务学习

我们的单任务模型的预测示例

请不要将Head API（位于tf.contrib中）与 tf.contrib.learn.head混淆，后者已弃用。

步骤3b - 使用multihead的多任务学习

我们最终得到了本教程中更有趣的部分：MTL逻辑。请记住，在最简单的情况下，使用MTL相当于在相同的特征提取部分上使用“multiple heads”，如下图所示:

使用Head API在TensorFlow中进行多任务学习

深度神经网络中多任务学习概述

在数学上，我们可以通过最小化任务特定损失的总和来共同优化所有任务。例如，假设我们有回归部分的损失L1（每个landmark的均方误差），以及分类部分的L2（不同的标签），我们可以通过梯度下降最小化L = L1 + L2。

在这个（非常冗长的）介绍之后，您可能不会对Head API具有针对这种情况的特定head（称为multi-head）感到惊讶。根据我们之前的描述，它可以线性组合出不同的heads的多个损失。在这一点上，我将让代码自己说话:

def multi_head_cnn_model_fn(features, labels, mode):
 
 # Extract the features
 dense = extract_features(features)
 
 # Predictions for each task
 predictions_nose = tf.layers.dense(inputs=dense, units=2)
 predictions_pose = tf.layers.dense(inputs=dense, units=5)
 logits = {'head_nose': predictions_nose, 'head_pose': predictions_pose}
 
 # Optimizer (for both tasks simultaneously)
 optimizer = tf.train.AdamOptimizer()
 # Two heads
 regression_head = tf.contrib.estimator.regression_head(name='head_nose', label_dimension=2)
 classification_head = tf.contrib.estimator.multi_class_head(name='head_pose', n_classes=5)
 
 # Multi-head combining two single heads
 multi_head = tf.contrib.estimator.multi_head([regression_head, classification_head])
 
 # Return the final model
 return multi_head.create_estimator_spec(features, mode, logits, labels, optimizer)

使用Head API在TensorFlow中进行多任务学习

为简单起见，我只考虑两个任务：预测鼻子位置和面部“姿势”（left profile, left, frontal, right, right profile）。我们只需要定义两个单独的heads（一个回归，一个分类），并将它们与multi_head对象组合。现在添加更多heads只是几行代码的问题！

此时的估算器可以使用标准方法进行训练，我们可以同时获得两个预测：

使用Head API在TensorFlow中进行多任务学习

多任务模型的预测：节点位置和姿势

tensorflow api

安科网

使用Head API在TensorFlow中进行多任务学习

白话深度学习

简介

第0步 - 加载机器学习数据集

第1步 - 使用tf.data和Dataset对象

第2步 - 使用自定义估算器构建卷积网络

步骤3a - 使用Head API重写我们的自定义估算器

步骤3b - 使用multihead的多任务学习

白话深度学习

相关推荐

如何在浏览器中使用TensorFlow？

TensorFlow为新旧Mac特供新版本，速度最高提升7倍

如何在PyTorch和TensorFlow中训练图像分类模型

对比PyTorch和TensorFlow的自动差异和动态模型

现在知道还不算晚，输入示例自动生成代码，谷歌开源这项神器要火

TensorFlow推出新接口，简化 ML移动端开发流程

2020年深度学习框架对比速读

输入示例，自动生成代码：TensorFlow官方工具TF-Coder已开源

TensorFlow Lattice：灵活、可控、可解释的机器学习

TensorFlow 2入门指南，初学者必备！

使用tensorflow进行音乐类型的分类

【tensorflow】常量和变量的定义

如何在tensorflow中判断tensor(张量)的值

TensorFlow会话常用的两种方式

Ｍnist手写数字识别 Tensorflow

TensorFlow被曝存严重bug，搭配Keras可能丢失权重，至今仍未修复

TensorFlow中超大的30个机器学习数据集

解决import tensorflow报错：ImportError: DLL load failed: 找不到指定的模块

Yolo v3 Introduction to object detection with TensorFlow 2

Sklearn 与 TensorFlow 机器学习实用指南第二版

白话深度学习