Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Commit 554fb99

Browse files
Add fine-tuning with Deepspeed example (#637)
* run glue with deepspeed
1 parent 0645407 commit 554fb99

File tree

3 files changed

+784
-0
lines changed

3 files changed

+784
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<!---
2+
Copyright 2020 The HuggingFace Team. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# Text classification examples with Deepspeed
18+
19+
## Deepspeed integration
20+
21+
This example shows integration Huggingface scripts with Deepspeed doing fine-tuning tasks
22+
23+
Here is some tested features:
24+
25+
* bf16 precision
26+
* ZeRO stage 0/1/2/3
27+
* ZeRO Offload(optimizer/param)
28+
* activation checkpointing
29+
* LoRA
30+
31+
## GLUE tasks
32+
33+
Based on the huggingface script [`run_glue_no_trainer.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue_no_trainer.py).
34+
35+
Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding
36+
Evaluation](https://gluebenchmark.com/). This script can fine-tune any of the models on the [hub](https://huggingface.co/models)
37+
and can also be used for a dataset hosted on our [hub](https://huggingface.co/datasets) or your own data in a csv or a JSON file
38+
(the script might need some tweaks in that case, refer to the comments inside for help).
39+
40+
GLUE is made up of a total of 9 different tasks. Here is how to run the script on one of them:
41+
42+
```bash
43+
export TASK_NAME=mrpc
44+
45+
deepspeed --num_gpus=12 run_glue_deepspeed.py \
46+
--model_name_or_path meta-llama/Llama-2-7b-hf \
47+
--task_name $TASK_NAME \
48+
--max_length 128 \
49+
--per_device_train_batch_size 32 \
50+
--learning_rate 2e-5 \
51+
--num_train_epochs 3 \
52+
--output_dir log/Llama/$TASK_NAME/
53+
```
54+
where task name can be one of cola, sst2, mrpc, stsb, qqp, mnli, qnli, rte, wnli.
55+
56+
57+
58+
59+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
accelerate >= 0.12.0
2+
datasets >= 1.8.0
3+
sentencepiece != 0.1.92
4+
scipy
5+
scikit-learn
6+
protobuf
7+
torch >= 1.3
8+
evaluate
9+
deepspeed
10+
peft
11+
intel_extension_for_pytorch
12+
transformers

0 commit comments

Comments
 (0)