Skip to content

Files

Latest commit

274f9ab · Apr 1, 2024

History

History
This branch is 68 commits behind datawhalechina/llms-from-scratch-cn:main.

04_learning_rate_schedulers

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Apr 1, 2024

Adding Bells and Whistles to the Training Loop

The main chapter used a relatively simple training function to keep the code readable and fit Chapter 5 within the page limits. Optionally, we can add a linear warm-up, a cosine decay schedule, and gradient clipping to improve the training stability and convergence.

You can find the code for this more sophisticated training function in Appendix D: Adding Bells and Whistles to the Training Loop.