Skip to content

p145、p146表述错误 #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
feng-1985 opened this issue Apr 8, 2025 · 0 comments
Open

p145、p146表述错误 #53

feng-1985 opened this issue Apr 8, 2025 · 0 comments

Comments

@feng-1985
Copy link

p145

在训练方式上,指令微调与预训练较为相似,很多设置包括数据组织形式都
可以预训练阶段所采用的技术(参考第 4 章和第 6 章)。本节主要介绍指令微调所
特有的一些训练策略。

p146

指令微调中的优化器设置(AdamW 或 Adafactor)、稳定训练技巧(权重衰减
和梯度裁剪)和训练技术(3D 并行、ZeRO 和混合精度训练)都与预训练保持阶
段一致
,可以完全沿用。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant