Augmentation in machine learning refers to the techniques used to increase the diversity and size of a dataset without actually collecting new data. The approach to data augmentation varies significantly between supervised and unsupervised learning due to their fundamental differences in how they use labeled and unlabeled data. In supervised learning, augmentation typically involves creating new labeled examples by transforming existing labeled data. In contrast, unsupervised learning focuses on augmenting unlabeled data, where the goal is to enhance the representation of the data itself rather than the labels.
In supervised learning, a common approach to augmentation is to apply transformations to the training images. For example, in image classification tasks, developers might rotate, flip, or crop images. These methods enrich the dataset while maintaining the correct labels for each image. For instance, if an image of a cat is rotated slightly, it still remains an image of a cat. Thus, the label remains intact. This augmentation helps the model generalize better by exposing it to a wider range of variations. The main objective is to improve the model's performance on unseen data by teaching it to recognize objects under various conditions.
On the other hand, augmentation in unsupervised learning is centered around enhancing the understanding of the data itself. For example, in clustering tasks where there are no labels available, augmentation might involve projecting the data into different feature spaces or applying noise to the data points. These actions help to uncover more intrinsic patterns and relationships within the data. For instance, transformations like adding Gaussian noise can help the model become more robust to variations in the data. The focus in unsupervised learning is less about the labels and more on creating richer representations that can help the model learn meaningful structures without predefined categories.