An activation function is a mathematical function applied to the output of each neuron in a neural network to introduce non-linearity. This is essential because without non-linearity, the network would only be able to model linear relationships, limiting its power.
Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh. For example, ReLU outputs the input directly if it is positive, or zero if it is negative, helping the network avoid vanishing gradients.
Activation functions enable neural networks to learn complex patterns and solve tasks like image recognition, speech processing, and natural language understanding.