|
| 1 | +<!--- |
| 2 | +Copyright 2020 The HuggingFace Team. All rights reserved. |
| 3 | +
|
| 4 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +you may not use this file except in compliance with the License. |
| 6 | +You may obtain a copy of the License at |
| 7 | +
|
| 8 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +Unless required by applicable law or agreed to in writing, software |
| 11 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +See the License for the specific language governing permissions and |
| 14 | +limitations under the License. |
| 15 | +--> |
| 16 | + |
| 17 | +# Text classification examples with Deepspeed |
| 18 | + |
| 19 | +## Deepspeed integration |
| 20 | + |
| 21 | +This example shows integration Huggingface scripts with Deepspeed doing fine-tuning tasks |
| 22 | + |
| 23 | +Here is some tested features: |
| 24 | + |
| 25 | +* bf16 precision |
| 26 | +* ZeRO stage 0/1/2/3 |
| 27 | +* ZeRO Offload(optimizer/param) |
| 28 | +* activation checkpointing |
| 29 | +* LoRA |
| 30 | + |
| 31 | +## GLUE tasks |
| 32 | + |
| 33 | +Based on the huggingface script [`run_glue_no_trainer.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue_no_trainer.py). |
| 34 | + |
| 35 | +Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding |
| 36 | +Evaluation](https://gluebenchmark.com/). This script can fine-tune any of the models on the [hub](https://huggingface.co/models) |
| 37 | +and can also be used for a dataset hosted on our [hub](https://huggingface.co/datasets) or your own data in a csv or a JSON file |
| 38 | +(the script might need some tweaks in that case, refer to the comments inside for help). |
| 39 | + |
| 40 | +GLUE is made up of a total of 9 different tasks. Here is how to run the script on one of them: |
| 41 | + |
| 42 | +```bash |
| 43 | +export TASK_NAME=mrpc |
| 44 | + |
| 45 | +deepspeed --num_gpus=12 run_glue_deepspeed.py \ |
| 46 | + --model_name_or_path meta-llama/Llama-2-7b-hf \ |
| 47 | + --task_name $TASK_NAME \ |
| 48 | + --max_length 128 \ |
| 49 | + --per_device_train_batch_size 32 \ |
| 50 | + --learning_rate 2e-5 \ |
| 51 | + --num_train_epochs 3 \ |
| 52 | + --output_dir log/Llama/$TASK_NAME/ |
| 53 | +``` |
| 54 | +where task name can be one of cola, sst2, mrpc, stsb, qqp, mnli, qnli, rte, wnli. |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | + |
0 commit comments