Add support for optimum-habana deepseek v3/r1 fp8 quantization #2164

skavulya · 2025-04-04T00:44:34Z

Type of Change

What does this PR do?

Support FP8 static quantization for optimum-habana deepseek v3/r1 models using Intel Neural Compressor (INC)

This feature needs changes in:

OH PR Support INC FP8 static quantization for deepseek_v3/r1 huggingface/optimum-habana#1907

Steps for FP8 quantization

# install OH
git clone https://github.com/huggingface/optimum-habana.git
cd optimum-habana
git fetch origin pull/1907/head:deepseek_v3_fp8
git checkout deepseek_v3_fp8
pip install -e .
pip install git+https://github.com/HabanaAI/[email protected]
pip install blobfile tiktoken

# install INC PR with OH deepseek_v3 support
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git fetch origin pull/2164/head:oh_ds_r1
git checkout oh_ds_r1 
pip uninstall neural_compressor_pt
pip install -r requirements.txt
pip install -r requirements_pt.txt
python setup.py develop pt

# Test FP8 Quantization with moonlight model on 2 cards with expert-parallelism
cd ../optimum-habana/examples/text-generation/
PT_HPU_LAZY_MODE=1 INC_DYNAMIC_MOE_EXPERTS=64 QUANT_CONFIG=quantization_config/maxabs_measure.json python3 ../gaudi_spawn.py  --world_size 2 run_generation.py --model_name_or_path moonshotai/Moonlight-16B-A3B --bf16 --trim_logits --batch_size 1 --use_hpu_graphs --use_kv_cache  --prompt "DeepSpeed is a machine learning framework"  --parallel_strategy "ep" --trust_remote_code_tokenizer

# FP8 dynamic moe op segfaults if SLICE_MAX_EXPERT>32
 SLICE_MAX_EXPERT=32 INC_DYNAMIC_MOE_EXPERTS=64 PT_HPU_LAZY_MODE=1 QUANT_CONFIG=quantization_config/maxabs_quant_mixtral.json python3 ../gaudi_spawn.py  --world_size 2 run_generation.py --model_name_or_path moonshotai/Moonlight-16B-A3B --bf16 --trim_logits --batch_size 1 --use_hpu_graphs --use_kv_cache  --prompt "DeepSpeed is a machine learning framework"  --parallel_strategy "ep" --trust_remote_code_tokenizer

Add support for optimum-habana deepseek v3/r1

6217ee6

skavulya marked this pull request as draft April 4, 2025 00:44

skavulya mentioned this pull request Apr 4, 2025

Support INC FP8 static quantization for deepseek_v3/r1 huggingface/optimum-habana#1907

Draft

3 tasks

skavulya marked this pull request as ready for review April 4, 2025 01:47

skavulya marked this pull request as draft April 7, 2025 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for optimum-habana deepseek v3/r1 fp8 quantization #2164

Add support for optimum-habana deepseek v3/r1 fp8 quantization #2164

skavulya commented Apr 4, 2025 •

edited

Loading

Add support for optimum-habana deepseek v3/r1 fp8 quantization #2164

Are you sure you want to change the base?

Add support for optimum-habana deepseek v3/r1 fp8 quantization #2164

Conversation

skavulya commented Apr 4, 2025 • edited Loading

Type of Change

What does this PR do?

skavulya commented Apr 4, 2025 •

edited

Loading