Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move all llama code to models/llama out of meta reference #1887

Merged
merged 8 commits into from
Apr 7, 2025

Conversation

ashwinb
Copy link
Contributor

@ashwinb ashwinb commented Apr 7, 2025

What does this PR do?

Move around bits. This makes the copies from llama-models much easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu.

Test Plan

LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed

Start a server with and without quantization. Point integration tests to it using:

pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 7, 2025
@ashwinb ashwinb force-pushed the refactor branch 2 times, most recently from 19c305b to 0cd551d Compare April 7, 2025 19:06
@ashwinb ashwinb merged commit 530d4bd into main Apr 7, 2025
22 checks passed
@ashwinb ashwinb deleted the refactor branch April 7, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants