How are sinusoidal embeddings implemented in diffusion models?

Sinusoidal embeddings are implemented in diffusion models primarily to provide a way to encode time or other continuous variables in a format that retains useful properties for the model's training and operation. Specifically, these embeddings use sinusoidal functions to generate vectors that capture periodicity and are particularly effective because they allow models to easily learn relationships between time steps and data points. The implementation involves defining a set of functions, where each dimension of the embedding is generated using sine and cosine functions with different frequencies.

To create a sinusoidal embedding for a time step ( t ), the formula typically looks like this:

[ \text{emb}(t) = \left[ \sin(t / 10000^{2i/d}) \text{, } \cos(t / 10000^{2i/d}) \right] ]

where ( i ) is the index of the dimension and ( d ) is the total number of embedding dimensions. This design allows each dimension to represent a different frequency, making it easier for the neural network to learn when certain features are present in the data relative to the diffusion process. By using this approach, models can effectively capture nuances in the data that are reliant on time or the progression of steps in the diffusion process.

For example, in a diffusion model simulating image denoising, the sinusoidal embeddings can be used to encode the step number in the diffusion process. As the model moves through iterations of noise addition and removal, these embeddings help the model identify and leverage the specific characteristics of the image at various stages. This can improve the clarity and quality of the generated image, as the model has a more structured understanding of its current state through the use of these embeddings. Overall, sinusoidal embeddings play a significant role in maintaining the temporal context within the framework of diffusion models.