Skip to content

Use torch.compile to speed up GPTQ algorithm #1496

Open
@dsikka

Description

@dsikka

The bulk of the runtime associated with the GPTQ algorithm is implemented in this helpers file src/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py

Maybe there are some opportunities to compile portions of this code?

Please include some runtime benchmarks to validate the changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueA good first issue for users wanting to contribute

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions