Skip to content

Fix gradient tests #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 10, 2025
Merged

Fix gradient tests #801

merged 2 commits into from
May 10, 2025

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented Apr 13, 2025

The order of tensors inside our buffer differs from the order in which we run our gradient checks.
As the layout of the reference and test tensor are the same, we're still comparing corresponding elements, but the mapping of (named) tensors and tolerances into the buffer is wrong; potentially, we didn't compare some elements, and compared others twice.

This PR reorders the tensor names and thresholds to correspond to our actual model definition, and adapts the thresholds accordingly (this is much better visible by looking at the diff of just the second commit). For layernorms, the thresholds had to be increased quite a lot (maybe it should have been suspicious that the last LN needed so much larger tolerances than the others; now they're more equal), but for some other tensors we could actually tighten the error bounds.

The error thresholds have been tested on an A6000.

@ngc92 ngc92 mentioned this pull request Apr 14, 2025
@karpathy
Copy link
Owner

good catch, agree on the order.

@karpathy karpathy merged commit f1e2ace into karpathy:master May 10, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants