mxfp4-llm/patch_override_scripts at main · amazon-science/mxfp4-llm

History

Name		Name	Last commit message	Last commit date
parent directory ..
te1.5		te1.5
Megatron-LM.patch		Megatron-LM.patch
README.md		README.md
microxcaling.patch		microxcaling.patch

README.md

Details of Patches and Overrides

We first modify the OCP MXFP4 datatype to do stochastic rounding. This involves scaling the post MX, pre-quantization input to prevent clipping and then performing stochastic rounding. Then, we apply a blockwise random Hadamard transform (RHT) to the matrix multiplication operands, which allows us to bound the variance of the GEMM output. For more information, see the paper.

We provide two parallel implementations of MXFP4 recipe in this repository:

microxcaling.patch adds support of dither_scale rounding mode to microxcaling/mx/mx_ops.py, which scales input matrices by $3/4$ before performing stochastic rounding, and scale back by $4/3$ afterwards.
Megatron-LM.patch supports training with BF16 forward + MXFP4 backward with major changes to megatron/core/tensor_parallel/layers.py
te1.5/ supports FP8 forward + MXFP4 backward by overriding layers defined in transformer_engine/pytorch/module/*

License

This project is licensed under the Apache 2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch_override_scripts

patch_override_scripts

README.md

Details of Patches and Overrides

License

Files

patch_override_scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

patch_override_scripts

Folders and files

parent directory

README.md

Details of Patches and Overrides

License