How do AI models learn what not to say or do?