TensorFlow Batch
Normalization
▌Introduction
So now we
know that some Activation function have Vanishing
gradients problem and it always
happens as the neural network increases in depth.
Internal
Covariate Shift (ICS), which
comes from the paper Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate Shift, defines the change in the distribution of network
activations due to the change in network parameters during training. That means
the distributions of source and target domains are relative, so if the
difference of distribution between training sample data and target data is too
large, our trained model cannot be generalized well.
PS. My
master thesis, Under-Sampling
Approaches for Improving Classification Accuracy of Minority Class in an
Imbalanced Dataset, also indicated
the same problem in data mining.
Batch
normalization (BN) standardizes
the distribution by stochastic
gradient descent before each
activation function to train the BN network to solve the vanishing
gradients and gradient
explode problem.
▌Environment
▋Python 3.6.2
▋TensorFlow 1.5.0
▌Implement
API Document: tf.nn.batch_normalization
Source code: Github
Here is a sample of
applying Batch Normalization before Activation function, Sigmoid.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
X = tf.placeholder_with_default([[[1.]]],
shape=[1,1,1], name="X")
fc_mean, fc_var = tf.nn.moments(X,axes=[0],) #Calculate
mean and variance
shift = tf.Variable(tf.zeros([1]))
scale = tf.Variable(tf.ones([1]))
epsilon = 0.001
Wx_plus_b = tf.nn.batch_normalization(X, fc_mean, fc_var,
shift, scale, epsilon)
with tf.name_scope('Sigmoid'):
fx =
tf.sigmoid(Wx_plus_b)
▌Reference
沒有留言:
張貼留言