Batch Normalization

What is Batch Normalization?

Batch normalization is a technique used to improve the training of deep neural networks by normalizing the inputs to each layer, thereby reducing internal covariate shift. Internal covariate shift refers to the change in the distribution of network activations due to the updates in the model’s parameters during training. Batch normalization stabilizes and accelerates training by ensuring that the inputs to each layer have a consistent distribution.

How Does Batch Normalization Work?

Batch normalization involves the following steps:

Compute Mean and Variance: For each mini-batch, the mean (μ\muμ) and variance (σ2\sigma^2σ2) of the inputs are computed.
Normalize the Inputs: The inputs are normalized using the calculated mean and variance: x^=x−μσ2+ϵ\hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}}x^=σ2+ϵx−μ where xxx is the input, μ\muμ is the mean, σ2\sigma^2σ2 is the variance, and ϵ\epsilonϵ is a small constant added for numerical stability.
Scale and Shift: After normalization, the inputs are scaled and shifted using learnable parameters γ\gammaγ and β\betaβ: y=γx^+βy = \gamma \hat{x} + \betay=γx^+β These parameters allow the model to recover the original distribution if necessary.
Apply to Network: The batch-normalized outputs are used as inputs to the next layer of the network.

Why is Batch Normalization Important?

Faster Training: Batch normalization allows for higher learning rates, leading to faster convergence during training.
Stability: By reducing internal covariate shift, batch normalization stabilizes the learning process and helps prevent the network from getting stuck in poor local minima.
Improved Generalization: Batch normalization has a regularizing effect, reducing the need for other regularization techniques like dropout, and can improve the model's ability to generalize to new data.
Reduced Sensitivity to Initialization: Models with batch normalization are less sensitive to the choice of initial parameter values, making the training process more robust.

Conclusion

Batch normalization is a crucial technique in deep learning that normalizes the inputs to each layer, leading to faster and more stable training. By reducing internal covariate shift and allowing the use of higher learning rates, batch normalization enhances the performance and generalization of deep neural networks, making it a standard practice in modern neural network training.

‍