-
Notifications
You must be signed in to change notification settings - Fork 5.8k
This change adds support for Intel Gaudi HPUs. #7275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Several configuration files are provided in the examples directory for use with Gaudi. LLaMA-Factory features and optimizations including inferencing, training (sft, dpo, etc.), LoRA fine-tuning, distributed training with DeepSpeed and DDP are working. Please see README for details. Co-authored-by: Yaser Afshar [email protected] Co-authored-by: Edward Mascarenhas [email protected] Co-authored-by: Jianhong-Zhang [email protected] Co-authored-by: Wenbin Chen [email protected] Co-authored-by: Voas, Tanner [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, please view the comments
Hello, I was wondering about the status of this? Is it safe to run a training job with this branch on Gaudi cluster? |
Yes you can. Please use the requirements-gaudi.txt to load the requirements in addition to the branch/PR. There are also yaml files in the examples/train_lora directory with _gaudi.yaml suffixes which should work out of the box. |
I try to run a job in the docker image: cd docker/docker-hpu/
docker compose up -d
docker compose exec llamafactory bash
|
@ehartford , Could you use the code at this commit in this branch for now? I will be pushing a commit to fix this after doing some more testing. This set of instructions should work. Feel free to email me directly at [email protected] and we could also resolve other issues you may encounter. cd to LLaMA Factory directory |
2e17b62
to
a16e3d4
Compare
LLaMA-Factory features and optimizations including inferencing, training (sft, dpo, etc.), LoRA fine-tuning, distributed training with DeepSpeed and DDP are working. Please see README for details. Co-authored-by: Yaser Afshar [email protected] Co-authored-by: Edward Mascarenhas [email protected] Co-authored-by: Jianhong-Zhang [email protected] Co-authored-by: Wenbin Chen [email protected]
Co-authored-by: Voas, Tanner [email protected]
Before submitting