Adversarial attacks exploit vulnerabilities in neural networks by introducing subtle, often imperceptible changes to input data, causing the model to make incorrect predictions. For example, adding noise to an image can trick a classifier into misidentifying objects.
Common attack methods include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), which iteratively perturb inputs within specific bounds. These attacks target the model's sensitivity to input variations, exposing weaknesses in its generalization.
Defending against adversarial attacks involves techniques like adversarial training (training on perturbed examples), defensive distillation, or using robust architectures. Adversarial robustness is critical in security-sensitive applications like facial recognition or autonomous vehicles.