Model pruning is a technique used to reduce the size of a neural network by removing certain parameters (weights or neurons) that are deemed less important or redundant. This is typically done after the model has been trained and helps in reducing the model’s complexity and improving inference speed without significantly affecting its performance.
Pruning works by identifying weights with small magnitudes or neurons with low activity during training, and removing them. This process can be done iteratively, with each step further reducing the network size.
Pruned models are particularly beneficial when deploying neural networks on devices with limited resources, such as mobile phones or embedded systems.