How do large language models estimate their own confidence?
Large language models don't intrinsically "feel" confidence like humans do. Instead, their confidence is typically estimated through probability scores assigned to each generated token or by analyzing the entropy of their output distributions. When an LLM generates text, it predicts the next word (token) based on the preceding context and assigns a probability to each possible word. The highest probability word is chosen.
A model is considered "more confident" when the probability for its chosen token is very high, and the probabilities for alternative tokens are very low. Techniques like confidence calibration attempt to align these internal probabilities more accurately with the true likelihood of the answer being correct. This helps improve the reliability of LLM responses, especially in critical applications where knowing when a model is uncertain is as important as the answer itself.