Skip to content

Updated Python APIs Compile doc string to clearly reflect QNN Compilation path #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions QEfficient/base/modeling_qeff.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,10 +241,12 @@ def _compile(
:mdp_ts_num_devices (int): Number of devices to partition to use Multi-Device Partitioning with tensor-slicing.
:num_speculative_tokens (int, optional): Number of speculative tokens to take as input for Speculative Decoding Target Language Model.
:enable_qnn (bool): Enables QNN Compilation. ``Defaults to False.``
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None.``
:compiler_options: Pass any compiler option as input. Any flag that is supported by `qaic-exec` can be passed. Params are converted to flags as below:
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file. ``Defaults to None.``
:compiler_options: Pass any compiler option as input.
Any flag that is supported by `qaic-exec` can be passed. Params are converted to flags as below:
- aic_num_cores=16 -> -aic-num-cores=16
- convert_to_fp16=True -> -convert-to-fp16
For QNN Compilation path, when enable_qnn is set to True, any parameter passed in compiler_options will be ignored.
"""
if onnx_path is None and self.onnx_path is None:
self.export()
Expand All @@ -256,6 +258,11 @@ def _compile(
raise FileNotFoundError(f"ONNX file not found at: {onnx_path}")

if enable_qnn:
if compiler_options:
logger.warning(
f"Extra arguments to QNN compilation are supported only via qnn_config file. Ignoring {compiler_options}"
)

self.qpc_path = qnn_compile(
onnx_path=onnx_path,
qpc_base_path=compile_dir,
Expand Down
29 changes: 18 additions & 11 deletions QEfficient/transformers/models/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,8 +291,13 @@ def compile(
:num_devices (int): Number of devices the model needs to be compiled for. Defaults to 1.
:num_cores (int): Number of cores used to compile the model.
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
:compiler_options (dict, optional): Additional compiler options.
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
For QNN Compiler: Following arguments can be passed.
:enable_qnn (bool): Enables QNN Compilation.
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.
Returns:
:str: Path of the compiled ``qpc`` package.
"""
Expand Down Expand Up @@ -1571,16 +1576,18 @@ def compile(
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
:mxint8_kv_cache (bool, optional): Whether to use ``mxint8`` compression for KV cache. ``Defaults to False``.
:num_speculative_tokens (int, optional): Number of speculative tokens to take as input for Speculative Decoding Target Language Model.
:mos (int, optional): Effort level to reduce on-chip memory. Defaults to -1, meaning no effort. ``Defaults to -1``.
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
:prefill_only (bool): if ``True`` compile for prefill only and if ``False`` compile for decode only. Defaults to None, which compiles for both ``prefill and ``decode``.
:compiler_options (dict, optional): Pass any compiler option as input. ``Defaults to None``.
Following flag can be passed in compiler_options to enable QNN Compilation path.
:enable_qnn (bool): Enables QNN Compilation. ``Defaults to False. if not passed.``
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None. if not passed``
for QAIC compilation path, any flag that is supported by ``qaic-exec`` can be passed. Params are converted to flags as below:
- aic_num_cores=16 -> -aic-num-cores=16
- convert_to_fp16=True -> -convert-to-fp16
:compiler_options (dict, optional): Additional compiler options. ``Defaults to None``.
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
:mos (int, optional): Effort level to reduce on-chip memory. Defaults to -1, meaning no effort. ``Defaults to -1``.
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
Params are converted to flags as below:
- aic_num_cores=16 -> -aic-num-cores=16
- convert_to_fp16=True -> -convert-to-fp16
For QNN Compiler: Following arguments can be passed.
:enable_qnn (bool): Enables QNN Compilation.
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.

Returns:
:str: Path of the compiled ``qpc`` package.
Expand Down
Loading