Temperature is a hyperparameter in LLMs that controls the randomness of output during text generation. It adjusts the probability distribution of possible next tokens, affecting how deterministic or creative the model's responses are. A lower temperature, closer to 0, focuses on the most probable tokens, resulting in more predictable and focused outputs. For example, with a temperature of 0.2, the model is likely to produce concise and accurate responses for factual queries.
Higher temperatures increase randomness by making less probable tokens more likely to be selected. This leads to more diverse and creative outputs, which is useful for generating imaginative content or brainstorming ideas. For instance, a temperature of 1.0 might yield unique and varied text suitable for storytelling or poetry.
The choice of temperature depends on the use case. Applications requiring precision, like code generation, benefit from low temperatures, while creative tasks thrive on higher values. Experimenting with different settings helps developers optimize outputs for specific goals.