What is pretraining in the context of large language models?