Open
Description
The bulk of the runtime associated with the GPTQ algorithm is implemented in this helpers file src/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py
Maybe there are some opportunities to compile portions of this code?
Please include some runtime benchmarks to validate the changes