Why do LLMs sometimes give inconsistent answers to the same logical question?
Ask an LLM the same logic puzzle twice and you might get two different answers. This happens because LLMs generate responses using probabilistic sampling — they don't compute a single correct answer, they sample from a distribution of likely next tokens. Small variations in phrasing or random sampling can push the model down a different reasoning path.
Structural uncertainty is the term researchers use to describe this: the model's reasoning process itself is unstable, not just its surface-level wording. A model might correctly solve a problem one way, then contradict itself when the same problem is reframed.
This is why consistency is increasingly used as a reliability metric. If a model truly understands a logical principle, it should reach the same conclusion regardless of how the question is framed. When it doesn't, that's a sign the model has learned patterns that mimic reasoning rather than the underlying logic itself.