To condition a diffusion model on external inputs, you can modify the model architecture to incorporate these inputs into the noise generation and denoising processes. This involves adjusting both the forward diffusion process and the reverse diffusion process to allow external data, such as class labels, images, or other variables, to influence the generation of samples.
Initially, during the forward diffusion process, you can concatenate external inputs with the original data before adding noise. For instance, if you’re working with images, you could add a condition vector that represents some categorical data, like the category of the image. This could be done by using techniques like one-hot encoding or embedding to convert the external input into a suitable format. The resultant noisy samples will then carry information from these external inputs, allowing the model to encode relevant context as the noise is added.
In the reverse diffusion process, the model should explicitly use the external inputs to guide the generation of samples. This can be achieved by conditioning the neural network responsible for denoising on the external input. One common approach is to use an attention mechanism that allows the model to focus on specific features relevant to the external input. For example, in a scenario where you are generating images based on a category label, conditioning the network on that label allows the model to generate images that are consistent with the given label, ensuring that the output aligns with what is expected given the external context. In summary, conditioning a diffusion model on external inputs requires carefully integrating those inputs in both the noise addition and reduction stages.