Embeddings can be sensitive to noisy data, as they capture patterns in the input data that may include irrelevant or erroneous information. However, they have some robustness to noise depending on how they are trained. For example, during training, embeddings can learn generalizable patterns from a large corpus, which can help to smooth over some noise.
When working with noisy data, embeddings typically rely on regularization techniques or more advanced training methods, such as data augmentation or dropout, to avoid overfitting to noise. Additionally, embedding models often include mechanisms for filtering or weighting the input data to minimize the impact of noisy or irrelevant features. For example, in NLP, stopwords (common words that don't carry much meaning) are usually removed during preprocessing to reduce noise.
Despite these techniques, noisy data can still affect the quality of embeddings, leading to poor performance on downstream tasks. Careful data cleaning and preprocessing steps, along with using robust models, can help mitigate the effects of noise and improve embedding quality.