2018年2月19日 星期一

[TensorFlow] Batch Normalization


 TensorFlow   Batch Normalization  



Introduction


So now we know that some Activation function have Vanishing gradients problem and it always happens as the neural network increases in depth.

Internal Covariate Shift (ICS), which comes from the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, defines the change in the distribution of network activations due to the change in network parameters during training. That means the distributions of source and target domains are relative, so if the difference of distribution between training sample data and target data is too large, our trained model cannot be generalized well.

PS. My master thesis, Under-Sampling Approaches for Improving Classification Accuracy of Minority Class in an Imbalanced Dataset, also indicated the same problem in data mining.

Batch normalization (BN) standardizes the distribution by stochastic gradient descent before each activation function to train the BN network to solve the vanishing gradients and gradient explode problem.








Environment


Python 3.6.2
TensorFlow 1.5.0


Implement




Source code: Github




Here is a sample of applying Batch Normalization before Activation function, Sigmoid.


import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

X = tf.placeholder_with_default([[[1.]]], shape=[1,1,1], name="X")

fc_mean, fc_var = tf.nn.moments(X,axes=[0],) #Calculate mean and variance

shift = tf.Variable(tf.zeros([1]))
scale = tf.Variable(tf.ones([1]))
epsilon = 0.001
Wx_plus_b = tf.nn.batch_normalization(X, fc_mean, fc_var, shift, scale, epsilon)

with tf.name_scope('Sigmoid'):
    fx = tf.sigmoid(Wx_plus_b)





Reference




沒有留言:

張貼留言