karatejb: [TensorFlow] Batch Normalization

2018年2月19日星期一

[TensorFlow] Batch Normalization

TensorFlow Batch Normalization

▌Introduction

So now we know that some Activation function have Vanishing gradients problem and it always happens as the neural network increases in depth.

Internal Covariate Shift (ICS), which comes from the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, defines the change in the distribution of network activations due to the change in network parameters during training. That means the distributions of source and target domains are relative, so if the difference of distribution between training sample data and target data is too large, our trained model cannot be generalized well.

PS. My master thesis, Under-Sampling Approaches for Improving Classification Accuracy of Minority Class in an Imbalanced Dataset, also indicated the same problem in data mining.

Batch normalization (BN) standardizes the distribution by stochastic gradient descent before each activation function to train the BN network to solve the vanishing gradients and gradient explode problem.