How do large language models estimate their own confidence?