Skip to content

LayerNormalization with rms_scaling documentation is different from implementation #21234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mzhukova opened this issue Apr 30, 2025 · 1 comment
Assignees
Labels
type:docs Need to modify the documentation

Comments

@mzhukova
Copy link

mzhukova commented Apr 30, 2025

The documentation mentions that

rms_scaling: If True, center and scale are ignored, and the inputs are scaled by gamma and the inverse square root of the square of all inputs. This is an approximate and faster approach that avoids ever computing the mean of the input.

However, in the implementation, it actually does the following:

        if self.rms_scaling:
            # Calculate outputs with only variance and gamma if rms scaling
            # is enabled
            # Calculate the variance along self.axis (layer activations).
            variance = ops.var(inputs, axis=self.axis, keepdims=True)
            inv = ops.rsqrt(variance + self.epsilon)

            outputs = (
                inputs * inv * ops.cast(_broadcast(self.gamma), inputs.dtype)
            )

So the mean is indeed used, as variance is computed here rather than RMS norm.

There was also a discussion during the addition of RMS Normalization (#20911 (comment)) that confirms this behavior.

I think the docs could use an update to clarify this behavior. Right now, it sounds like the mean isn't used when rms_scaling is on, but the code suggests otherwise.

@mzhukova
Copy link
Author

Alternatively, the implementation could be adjusted to match the docs, though that might make RMSNormalization behave just like LayerNormalization with rms_scaling=True, correct?

@dhantule dhantule added the type:docs Need to modify the documentation label May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:docs Need to modify the documentation
Projects
None yet
Development

No branches or pull requests

3 participants