From 1e835ca6946e3dfec57ffe35f34dee455a40f0fe Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Thu, 10 Apr 2025 15:26:47 +0200 Subject: [PATCH 1/2] added mlp_tutorials --- docs/guides/mlp_tutorials.md | 11 + docs/guides/mlp_tutorials/llm-finetuning.md | 170 ++++++++++++ docs/guides/mlp_tutorials/llm-inference.md | 254 ++++++++++++++++++ .../mlp_tutorials/llm-nanotron-training.md | 254 ++++++++++++++++++ docs/platforms/mlp/index.md | 3 +- mkdocs.yml | 1 + 6 files changed, 691 insertions(+), 2 deletions(-) create mode 100644 docs/guides/mlp_tutorials.md create mode 100644 docs/guides/mlp_tutorials/llm-finetuning.md create mode 100644 docs/guides/mlp_tutorials/llm-inference.md create mode 100644 docs/guides/mlp_tutorials/llm-nanotron-training.md diff --git a/docs/guides/mlp_tutorials.md b/docs/guides/mlp_tutorials.md new file mode 100644 index 00000000..df143499 --- /dev/null +++ b/docs/guides/mlp_tutorials.md @@ -0,0 +1,11 @@ +[](){#ref-guides-mlp-tutorials} +# MLP Tutorials + +These tutorials solve simple MLP tasks using the [Container Engine][ref-container-engine] on the ML-Platform. + +1. [LLM Inference][ref-mlp-llm-inference-tutorial] +2. [LLM Finetuning][ref-mlp-llm-finetuning-tutorial] +3. [Nanotron Training][ref-mlp-llm-nanotron-tutorial] + + + diff --git a/docs/guides/mlp_tutorials/llm-finetuning.md b/docs/guides/mlp_tutorials/llm-finetuning.md new file mode 100644 index 00000000..7263def1 --- /dev/null +++ b/docs/guides/mlp_tutorials/llm-finetuning.md @@ -0,0 +1,170 @@ +[](){#ref-mlp-llm-finetuning-tutorial} + +# LLM Finetuning Tutorial + +This tutorial will take the model from the [LLM Inference][ref-mlp-llm-inference-tutorial] tutorial and show you how to perform finetuning. This means that we take the model and train it on some new custom data to change its behavior. + +To complete the tutorial, we set up some extra libraries that will help us to update the state of the machine learning model. We also write a script that will allow us to unlock more of the performance offered by the cluster, by running our fine-tuning task on two or more nodes. + +### Prerequisites + +This tutorial assumes you've already successfully completed the [LLM Inference][ref-mlp-llm-inference-tutorial] tutorial. For fine-tuning Gemma, we will rely on the NGC PyTorch container and the libraries we've already installed in the Python environment used previously. + +### Set up TRL + +We will use HuggingFace TRL to fine-tune Gemma-7B on the [OpenAssistant dataset](https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25). First, we need to update our Python environment with some extra libraries to support TRL. To do this, we can launch an interactive shell in the PyTorch container, just like we did in the previous tutorial. Then, we install `peft`: + +``` +[cluster][user@cluster-ln001 gemma-inference]$ cd $SCRATCH/gemma-inference +[cluster][user@cluster-ln001 gemma-inference]$ srun --environment=gemma-pytorch --container-workdir=$PWD --pty bash +user@nid001234:/bret/scratch/cscs/user/gemma-inference$ source ./gemma-venv/bin/activate +(gemma-venv) user@nid001234:/bret/scratch/cscs/user/gemma-inference$ python -m pip install peft==0.11.1 +# ... pip output ... +``` + +Next, we also need to clone and install the `trl` Git repository so that we have access to the fine-tuning scripts in it. For this purpose, we will install the package in editable mode in the virtual environment. This makes it available in python scripts independent of the current working directory and without creating a redundant copy of the files. + +``` +[cluster][user@cluster-ln001 ~]$ git clone https://github.com/huggingface/trl -b v0.7.11 +[cluster][user@cluster-ln001 ~]$ pip install -e ./trl # install in editable mode +``` + +When this step is complete, you can exit the shell by typing `exit`. + +### Finetune Gemma-7B + +t this point, we can set up a fine-tuning script and start training Gemma-7B. Use your favorite text editor to create the file `fine-tune-gemma.sh` just outside the trl and gemma-venv directories: + +```bash title="fine-tune-gemma.sh" +#!/bin/bash + +source ./gemma-venv/bin/activate + +set -x + +export HF_HOME=$SCRATCH/huggingface +export TRANSFORMERS_VERBOSITY=info + +ACCEL_PROCS=$(( $SLURM_NNODES * $SLURM_GPUS_PER_NODE )) + +MAIN_ADDR=$(echo "${SLURM_NODELIST}" | sed 's/[],].*//g; s/\[//g') +MAIN_PORT=12802 + +accelerate launch --config_file trl/examples/accelerate_configs/multi_gpu.yaml \ + --num_machines=$SLURM_NNODES --num_processes=$ACCEL_PROCS \ + --machine_rank $SLURM_PROCID \ + --main_process_ip $MAIN_ADDR --main_process_port $MAIN_PORT \ + trl/examples/scripts/sft.py \ + --model_name google/gemma-7b \ + --dataset_name OpenAssistant/oasst_top1_2023-08-25 \ + --per_device_train_batch_size 2 \ + --gradient_accumulation_steps 1 \ + --learning_rate 2e-4 \ + --save_steps 200 \ + --max_steps 400 \ + --use_peft \ + --lora_r 16 --lora_alpha 32 \ + --lora_target_modules q_proj k_proj v_proj o_proj \ + --output_dir gemma-finetuned-openassistant +``` + +This script has quite a bit more content to unpack. We use HuggingFace accelerate to launch the fine-tuning process, so we need to make sure that accelerate understands which hardware is available and where. Setting this up will be useful in the long run because it means we can tell SLURM how much hardware to reserve, and this script will setup all the details for us. + +The cluster has four GH200 chips per compute node. We can make them accessible to scripts run through srun/sbatch via the option `--gpus-per-node=4`. Then, we calculate how many processes accelerate should launch. We want to map each GPU to a separate process, this should be four processes per node. We multiply this by the number of nodes to obtain the total number of processes. Next, we use some bash magic to extract the name of the head node from SLURM environment variables. Accelerate expects one main node and launches tasks on the other nodes from this main node. Having sourced our python environment at the top of the script, we can then launch Gemma fine-tuning. The first four lines of the launch line are used to configure accelerate. Everything after that configures the `trl/examples/scripts/sft.py` Python script, which we use to train Gemma. + +Next, we also need to create a short SLURM batch script to launch our fine-tuning script: + +```bash title="fine-tune-sft.sbatch" +#!/bin/bash +#SBATCH --job-name=gemma-finetune +#SBATCH --time=00:30:00 +#SBATCH --ntasks-per-node=1 +#SBATCH --gpus-per-node=4 +#SBATCH --cpus-per-task=288 +#SBATCH --account= + +set -x + +srun -ul --environment=gemma-pytorch --container-workdir=$PWD bash fine-tune-gemma.sh +``` + +We set a few Slurm parameters like we already did in the previous tutorial. Note that we leave the number of nodes unspecified. This way, we can decide the number of nodes we want to use when we launch the batch job using Slurm. + +Now that we've setup a fine-tuning script and a Slurm batch script, we can launch our fine-tuning job. We'll start out by launching it on two nodes. It should take about 10-15 minutes to fine-tune Gemma: + +``` +[cluster][user@cluster-ln001 ~]$ sbatch --nodes=1 fine-tune-sft.sbatch +``` + +### Compare finetuned Gemma against default Gemma + +We can reuse our python script from the first tutorial to do inference on the Gemma model that we just fine-tuned. Let's try out a different prompt in `gemma-inference.py`: + +``` +input_text = "What are the 5 tallest mountains in the Swiss Alps?" +``` + +We can run inference using our batch script from the previous tutorial: + +``` +[cluster][user@cluster-ln001 ~]$ sbatch ./gemma-inference.sbatch +``` + +Inspecting the output should yield something like this: + +``` +What are the 5 tallest mountains in the Swiss Alps? + +The Swiss Alps are home to some of the tallest mountains in the world. Here are +the 5 tallest mountains in the Swiss Alps: + +1. Mont Blanc (4,808 meters) +2. Matterhorn (4,411 meters) +3. Dom (4,161 meters) +4. Jungfrau (4,158 meters) +5. Mont Rose (4,117 meters) +``` + +Next, we can update the model line in our Python inference script to use the model that we just fine-tuned: + +``` +model = AutoModelForCausalLM.from_pretrained("gemma-finetuned-openassistant/checkpoint-400", device_map="auto") +``` + +If we re-run inference, the output will be a bit more detailed and explanatory, similar to output we might expect from a helpful chatbot. One example looks like this: + +``` +What are the 5 tallest mountains in the Swiss Alps? + +The Swiss Alps are home to some of the tallest mountains in Europe, and they are a popular destination for mountai +neers and hikers. Here are the five tallest mountains in the Swiss Alps: + +1. Mont Blanc (4,808 m/15,774 ft): Mont Blanc is the highest mountain in the Alps and the highest mountain in Euro +pe outside of Russia. It is located on the border between France and Italy, and it is a popular destination for mo +untaineers and hikers. + +2. Dufourspitze (4,634 m/15,203 ft): Dufourspitze is the highest mountain in Switzerland and the second-highest mo +untain in the Alps. It is located in the Valais canton of Switzerland, and it is a popular destination for mountai +neers and hikers. + +3. Liskamm (4,527 m/14,855 ft): Liskamm is a mountain in the Bernese Alps of Switzerland. It is located in the Ber +n canton of Switzerland, and it is a popular destination for mountaineers and hikers. + +4. Weisshorn (4,506 m/14,783 ft): Weisshorn is a mountain in the Pennine Alps of Switzerland. It is located in the + Valais canton of Switzerland, and it is a popular destination for mountaineers and hikers. + +5. Matterhorn (4,478 m/14,690 ft): Matterhorn is a mountain in the Pennine Alps of Switzerland. It is located in the Valais canton of Switzerland, and it is a popular destination for mountaineers and hikers. + +These mountains are all located in the Swiss Alps, and they are a popular destination for mountaineers and hikers. If you are planning a trip to the Swiss Alps, be sure to check out these mountains and plan your itinerary accordingly. +``` + +Your output may look different after fine-tuning, but in general you will see that the fine-tuned model generates more verbose output. Double-checking the output reveals that the list of mountains produced by Gemma is not actually correct. The following table lists the 5 tallest Swiss peaks, according to Wikipedia. + + +1. Dufourspitze 4,634m +2. Nordend 4,609m +3. Zumsteinspitz 4,563m +4. Signalkuppe 4,554m +5. Dom 4,545m + +This is an important reminder that machine-learning models like Gemma need extra checks to confirm any generated outputs. \ No newline at end of file diff --git a/docs/guides/mlp_tutorials/llm-inference.md b/docs/guides/mlp_tutorials/llm-inference.md new file mode 100644 index 00000000..55ae92fc --- /dev/null +++ b/docs/guides/mlp_tutorials/llm-inference.md @@ -0,0 +1,254 @@ +[](){#ref-mlp-llm-inference-tutorial} + +# LLM Inference Tutorial + +This tutorial will guide you through the steps required to set up a PyTorch container and do ML inference. This means that we load an existing machine learning model, prompt it with some custom data, and run the model to see what output it will generate with our data. + +To complete the tutorial, we get a PyTorch container from Nvidia, customize it to suit our needs, and tell the Container Engine how to run it. Finally, we set up and run a python script to run the machine learning model and generate some output. + +The model we will be running is Google's [Gemma-7B](https://huggingface.co/google/gemma-7b#description), an LLM similar in style to the popular ChatGPT, which can generate text responses to text prompts that we feed into it. + +## Gemma-7B Inference using NGC PyTorch + +### Prequisites + +This tutorial assumes you are able to access the cluster via SSH. To set up access to CSCS systems, follow the guide [here][ref-ssh], and read through the documentation about the [ML Platform][ref-platform-mlp]. + +### Set up Permissions for the Nvidia NGC Catalog + +Some [Nvidia NGC](https://www.nvidia.com/en-us/gpu-cloud) containers can only be downloaded with a valid API token, so we need to set one up. Create an account and setup your API token in the [Nvidia NGC container catalog](https://catalog.ngc.nvidia.com). Then, use your favorite text editor to create a credentials file `~/.config/enroot/.credentials` for enroot. Enroot will be responsible for fetching the container image from NGC behind the scenes. The credentials file should look like this: + +``` +machine nvcr.io login $oauthtoken password +``` + +Make sure to replace `` with your actual token. + +### Modify the NGC Container + +In theory, we could now just go ahead and use the container to run some PyTorch code. However, chances are that we will need some additional libraries or software. For this reason, we need to use some docker commands to build a container on top of what is provided by Nvidia. To do this, we create a new directory for building containers in our home directory and set up a [Dockerfile](https://docs.docker.com/reference/dockerfile/): + +``` +[cluster][user@cluster-ln001 ~]$ cd $SCRATCH +[cluster][user@cluster-ln001 user]$ mkdir pytorch-24.01-py3-venv && cd pytorch-24.01-py3-venv +``` + +Use your favorite text editor to create a file `Dockerfile` here. The Dockerfile should look like this: + +``` +FROM nvcr.io/nvidia/pytorch:24.01-py3 + +ENV DEBIAN_FRONTEND=noninteractive + +RUN apt-get update && apt-get install -y python3.10-venv && apt-get clean && rm -rf /var/lib/apt/lists/* +``` + +The first line specifies that we are working on top of an existing container. In this case we start `FROM` an NGC PyTorch container. Next, we set an `ENV`ironment variable that helps us run `apt-get` in the container. Finally, we `RUN` the package installer `apt-get` to install python virtual environments. This will let us install python packages later on without having to rebuild the container again and again. There's a bunch of extra commands in this line to tidy things up. If you want to understand what is happening, take a look at the [Docker documentation](https://docs.docker.com/develop/develop-images/instructions/#apt-get). + +Now that we've setup the Dockerfile, we can go ahead and pass it to [Podman](https://podman.io/) to build a container. Podman is a tool that enables us to fetch, manipulate, and interact with containers on the cluster. For more information, please see the [Container Engine][ref-container-engine] page. To use Podman, we first need to configure some storage locations for it. This step is straightforward, just make the file `$HOME/.config/containers/storage.conf` (or `$XDG_CONFIG_HOME/containers/storage.conf` if `XDG_CONFIG_HOME` is set): + +``` +[storage] + driver = "overlay" + runroot = "/dev/shm/$USER/runroot" + graphroot = "/dev/shm/$USER/root" + +[storage.options.overlay] + mount_program = "/usr/bin/fuse-overlayfs-1.13" +``` + + +To build a container with Podman, we need to request a shell on a compute node from [SLURM][ref-slurm], pass the Dockerfile to Podman, and finally import the freshly built container using enroot. SLURM is a workload manager which distributes workloads on the cluster. Through SLURM, many people can use the supercomputer at the same time without interfering with one another in any way: + +``` +[cluster][user@cluster-ln001 pytorch-24.01-py3-venv]$ srun -A --pty bash +[cluster][user@nid001234 pytorch-24.01-py3-venv]$ podman build -t pytorch:24.01-py3-venv . +# ... lots of output here ... +[cluster][user@nid001234 pytorch-24.01-py3-venv]$ enroot import -x mount -o pytorch-24.01-py3-venv.sqsh podman://pytorch:24.01-py3-venv +# ... more output here ... +``` + +where you should replace `` with your project account ID. At this point, you can exit the SLURM allocation by typing `exit`. You should be able to see a new squashfile next to your Dockerfile: + +``` +[cluster][user@cluster-ln001 pytorch-24.01-py3-venv]$ ls +Dockerfile pytorch-24.01-py3-ven.sqsh +``` + +This squashfile is essentially a compressed container image, which can be run directly by the container engine. We will use our freshly-built container `pytorch-24.01-py3-venv.sqsh` in the following steps to run a PyTorch script that loads the Google Gemma-7B model and performs some inference with it. + +### Set up an EDF + +We need to set up an EDF (Environment Definition File) which tells the Container Engine what container to load, where to mount it, and what plugins to load. Use your favorite text editor to create a file `~/.edf/gemma-pytorch.toml` for the container engine. The EDF should look like this: + +``` +image = "/capstor/scratch/cscs//pytorch-24.01-py3-venv/pytorch-24.01-py3-venv.sqsh" + +mounts = ["/capstor", "/users"] + +writable = true + +[annotations] +com.hooks.aws_ofi_nccl.enabled = "true" +com.hooks.aws_ofi_nccl.variant = "cuda12" + +[env] +FI_CXI_DISABLE_HOST_REGISTER = "1" +FI_MR_CACHE_MONITOR = "userfaultfd" +NCCL_DEBUG = "INFO" +``` + +Make sure to replace `` with your actual CSCS username. If you've decided to build the container somewhere else, make sure to supply the correct path to the `image` variable. + +The `image` variable defines which container we want to load. This could either be a container from an online docker repository, like `nvcr.io/nvidia/pytorch:24.01-py3`, or in our case, a local squashfile which we built ourselves. + +The `mounts` variable defines which directories we want to mount where in our container. In general, it's a good idea to use the scratch directory to store outputs from any scientific software. In our case, we will not generate a lot of output, but it's a good practice to stick to anyways. + +Finally, the `workdir` variable tells the container engine where to start working. If we request a shell, this is where we will find ourselves dropped initially after starting the container. + +### Set up the Python Virtual Environment + +This will be the first time we run our modified container. To run the container, we need allocate some compute resources using Slurm and launch a shell, just like we already did to build the container. This time, we also use the `--environment` option to specify that we want to launch the shell inside the container specified by our gemma-pytorch EDF file: + +``` +[cluster][user@cluster-ln001 ~]$ cd $SCRATCH && mkdir -p gemma-inference && cd gemma-inference +[cluster][user@cluster-ln001 gemma-inference]$ srun -A --environment=gemma-pytorch --container-workdir=$PWD --pty bash +``` + +PyTorch is already setup in the container for us. We can verify this by asking pip for a list of installed packages: + +``` +user@nid001234:/capstor/scratch/cscs/user/gemma-inference$ python -m pip list | grep torch +pytorch-quantization 2.1.2 +torch 2.2.0a0+81ea7a4 +torch-tensorrt 2.2.0a0 +torchdata 0.7.0a0 +torchtext 0.17.0a0 +torchvision 0.17.0a0 +``` + +However, we will need to install a few more Python packages to make it easier to do inference with Gemma-7B. We create a virtual environment using python-venv. The `--system-site-packages` option ensures that we install packages in addition to the existing packages and don't accidentally install a new version of PyTorch over the one that has been put in place by Nvidia. Next, we activate the environment and use pip to install the two packages we need, `accelerate` and `transformers`: + +``` +user@nid001234:gemma-inference$ python -m venv --system-site-packages ./gemma-venv +user@nid001234:gemma-inference$ source ./gemma-venv/bin/activate +(gemma-venv) user@nid001234:/capstor/scratch/cscs/user/gemma-inference$ python -m pip install accelerate==0.30.1 transformers==4.38.1 +# ... pip output ... +``` + +Before we move on to running the Gemma-7B model, we additionally need to make an account at [HuggingFace](https://huggingface.co), get an API token, and accept the [license agreement](https://huggingface.co/google/gemma-7b-it) for the [Gemma-7B](https://huggingface.co/google/gemma-7b) model. You can save the token to `$SCRATCH` using the huggingface-cli: + +``` +user@nid001234:gemma-inference$ pip install -U "huggingface_hub[cli]" +user@nid001234:gemma-inference$ HF_HOME=$SCRATCH/huggingface huggingface-cli login +``` + +At this point, you can exit the SLURM allocation again by typing `exit`. If you `ls` the contents of the `gemma-inference` folder, you will see that the `gemma-venv` virtual environment folder persists outside of the SLURM job. Keep in mind that this virtual environment won't actually work unless you're running something from inside the PyTorch container. This is because the virtual environment ultimately relies on the resources packaged inside the container. + +### Run Inference on Gemma-7B + +Cool, now you have a working container with PyTorch and all the necessary Python packages installed! Let's move on to Gemma-7B. We write a Python script `$SCRATCH/gemma-inference/gemma-inference.py` to load the model and prompt it with some custom text. The Python script should look like this: + +``` +from transformers import AutoTokenizer, AutoModelForCausalLM +import torch + +tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it") +model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto") + +input_text = "Write me a poem about the Swiss Alps." +input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") + +outputs = model.generate(**input_ids, max_new_tokens=1024) +print(tokenizer.decode(outputs[0])) +``` + +Feel free to change the `input_text` variable to whatever prompt you like. + +All that remains is to run the python script inside the PyTorch container. There are several ways of doing this. As before, you could just use Slurm to get an interactive shell in the container. Then you would source the virtual environment and run the python script we just wrote. There's nothing wrong with this approach per se, but consider that you might be running much more complex and lengthy Slurm jobs in the future. You'll want to document how you're calling Slurm, what commands you're running on the shell, and you might not want to (or might not be able to) keep a terminal open for the length of time the job might take. For this reason, it often makes sense to write a batch file, which enables you to document all these processes and run the Slurm job regardless of whether you're still connected to the cluster. + +Create a SLURM batch file `gemma-inference.sbatch` anywhere you like, for example in your home directory. The SLURM batch file should look like this: + +```bash title="gemma-inference.sbatch" +#!/bin/bash +#SBATCH --job-name=gemma-inference +#SBATCH --time=00:15:00 +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=1 +#SBATCH --cpus-per-task=288 +#SBATCH --environment=gemma-pytorch +#SBATCH --account= + +export HF_HOME=$SCRATCH/huggingface +export TRANSFORMERS_VERBOSITY=info + +cd $SCRATCH/gemma-inference/ +source ./gemma-venv/bin/activate + +set -x + +python ./gemma-inference.py +``` + +The first few lines of the batch script declare the shell we want to use to run this batch file and pass several options to the SLURM scheduler. You can see that one of these options is one we used previously to load our EDF file. After this, we `cd` to our working directory, `source` our virtual environment and finally run our inference script. + +As an alternative to using the `#SBATCH --environment=gemma-pytorch` option you can also run the code in the above script wrapped into an `srun -A -ul --environment=gemma-pytorch bash -c "..."` statement. The tutorial on nanotron e.g. uses this pattern in `run_tiny_llama.sh`. + +Once you've finished editing the batch file, you can save it and run it with SLURM: + +``` +[cluster][user@cluster-ln001 ~]$ sbatch ./gemma-inference.sbatch +``` + +This command should just finish without any output and return you to your terminal. At this point, you can follow the output in your shell using `tail -f slurm-.out`. Besides you're free to do whatever you like; you can close the terminal, keep working, or just wait for the Slurm job to finish. You can always check on the state of your job by logging back into the cluster and running `squeue -l --me`. Once your job finishes, you will find a file in the same directory you ran it from, named something like `slurm-.out`, and containing the output generated by your Slurm job. For this tutorial, you should see something like the following: + + +```bash +[cluster][user@cluster-ln001 gemma-inference]$ cat ./slurm-543210.out +/capstor/scratch/cscs/user/gemma-inference/gemma-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. + warnings.warn( +Gemma's activation function should be approximate GeLU and not exact GeLU. +Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu` instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details. +Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00, 1.13it/s] +/capstor/scratch/cscs/user/gemma-inference/gemma-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. + warnings.warn( +Write me a poem about the Swiss Alps. + +In the heart of Switzerland, where towering peaks touch sky, +Lies a playground of beauty, beneath the watchful eye. +The Swiss Alps, a majestic force, +A symphony of granite, snow, and force. + +Snow-laden peaks pierce the heavens above, +Their glaciers whisper secrets of ancient love. +Emerald valleys bloom with flowers, +A tapestry of colors, a breathtaking sight. + +Hiking trails wind through meadows and woods, +Where waterfalls cascade, a silent song unfolds. +The crystal clear lakes reflect the sky above, +A mirror of dreams, a place of peace and love. + +The Swiss Alps, a treasure to behold, +A land of wonder, a story untold. +From towering peaks to shimmering shores, +They inspire awe, forevermore. +``` + +Congrats! You've run Google Gemma-7B inference on four GH200 chips simultaneously. Move on to the next tutorial or try the challenge. + +### Challenge + +Using the same approach as in the latter half of step 4, use pip to install the package `nvitop`. This is a tool that shows you a concise real-time summary of GPU activity. Then, run Gemma and launch nvitop at the same time: + +``` +(gemma-venv) user@nid001234:/capstor/scratch/cscs/user/gemma-inference$ python ./gemma-inference.py > ./gemma-output.log 2>&1 & nvitop +``` + +Note the use of bash `> ./gemma-output.log 2>&1` to hide any output from Python. Note also the use of the single ampersand `'&'` which backgrounds the first command and runs `nvitop` on top. + +After a moment, you will see your Python script spawn on all four GPUs, after which the GPU activity will increase a bit and then go back to idle. At this point, you can hit `q` to quite nvitop and you will find the output of your Python script in `./gemma-output.log`. + +### Collaborating in Git + +In order to track and exchange your progress with colleagues, it is recommended to store the EDF, Dockerfile and your application code alongside in a Git repository in a directory on `$SCRATCH` and share it with colleagues. diff --git a/docs/guides/mlp_tutorials/llm-nanotron-training.md b/docs/guides/mlp_tutorials/llm-nanotron-training.md new file mode 100644 index 00000000..505e42e3 --- /dev/null +++ b/docs/guides/mlp_tutorials/llm-nanotron-training.md @@ -0,0 +1,254 @@ +[](){#ref-mlp-llm-nanotron-tutorial} + +# LLM Nanotron Training Tutorial + +In this tutorial, we will build a container image to run nanotron training jobs. We will train a 109M parameter model with ~100M wikitext tokens as a proof of concept. + +### Prequisites + +It is also recommended to follow the previous tutorials: [LLM Inference][ref-mlp-llm-inference-tutorial] and [LLM Finetuning][ref-mlp-llm-finetuning-tutorial], as this will build up from it. + +### Set up Podman + +Edit your `$HOME/.config/containers/storage.conf` according to the following minimal template: + +```title="$HOME/.config/containers/storage.conf" +[storage] + driver = "overlay" + runroot = "/dev/shm/$USER/runroot" + graphroot = "/dev/shm/$USER/root" + +[storage.options.overlay] + mount_program = "/usr/bin/fuse-overlayfs-1.13" +``` + +## Modify the NGC Container + +See previous tutorial for context. Here, we assume we are already in a compute node (run `srun -A --pty bash` to get an interactive session). In this case, we will be creating the dockerfile in `$SCRATCH/container-image/nanotron/Dockerfile`. These are the contents of the dockerfile: + +```title="$SCRATCH/container-image/nanotron/Dockerfile" +FROM nvcr.io/nvidia/pytorch:24.04-py3 + +# Update flash-attn. +RUN pip install --upgrade --no-build-isolation flash-attn==2.5.8 + +# Install the rest of dependencies. +RUN pip install \ + datasets \ + transformers \ + wandb \ + dacite \ + pyyaml \ + numpy \ + packaging \ + safetensors \ + tqdm +``` + +Then build and import the container. + +```bash +cd $SCRATCH/container-image/nanotron +podman build -t nanotron:v1.0 . +enroot import -x mount -o nanotron-v1.0.sqsh podman://nanotron:v1.0 +``` + +Now exit the interactive session by running `exit`. + +### Set up an EDF + +See the previous tutorial for context. In this case, the edf will be at `$HOME/.edf/nanotron.toml` and will have the following contents: + +```title="$HOME/.edf/nanotron.toml" +image = "/capstor/scratch/cscs//container-image/nanotron/nanotron-v1.0.sqsh" +mounts = ["/capstor", "/users"] +workdir = "/users//" +writable = true + +[annotations] +com.hooks.aws_ofi_nccl.enabled = "true" +com.hooks.aws_ofi_nccl.variant = "cuda12" + +[env] +FI_CXI_DISABLE_HOST_REGISTER = "1" +FI_MR_CACHE_MONITOR = "userfaultfd" +NCCL_DEBUG = "INFO" +``` + +Note that, if you built your own container image, you will need to modify the image path. + +### Preparing a Training Job + +Now let's download nanotron. In the login node run: + +```bash +git clone https://github.com/huggingface/nanotron.git +cd nanotron +``` + +And with your favorite text editor, create the following nanotron configuration file in `$HOME/nanotron/examples/config_tiny_llama_wikitext.yaml`: + +```title="$HOME/nanotron/examples/config_tiny_llama_wikitext.yaml" +general: + benchmark_csv_path: null + consumed_train_samples: null + ignore_sanity_checks: true + project: debug + run: tiny_llama_%date_%jobid + seed: 42 + step: null +model: + ddp_bucket_cap_mb: 25 + dtype: bfloat16 + init_method: + std: 0.025 + make_vocab_size_divisible_by: 1 + model_config: + bos_token_id: 1 + eos_token_id: 2 + hidden_act: silu + hidden_size: 768 + initializer_range: 0.02 + intermediate_size: 1536 + is_llama_config: true + max_position_embeddings: 512 + num_attention_heads: 12 + num_hidden_layers: 12 + num_key_value_heads: 12 + pad_token_id: null + pretraining_tp: 1 + rms_norm_eps: 1.0e-05 + rope_scaling: null + tie_word_embeddings: true + use_cache: true + vocab_size: 50257 +optimizer: + accumulate_grad_in_fp32: true + clip_grad: 1.0 + learning_rate_scheduler: + learning_rate: 0.001 + lr_decay_starting_step: null + lr_decay_steps: null + lr_decay_style: cosine + lr_warmup_steps: 150 # 10% of the total steps + lr_warmup_style: linear + min_decay_lr: 0.00001 + optimizer_factory: + adam_beta1: 0.9 + adam_beta2: 0.95 + adam_eps: 1.0e-08 + name: adamW + torch_adam_is_fused: true + weight_decay: 0.01 + zero_stage: 1 +parallelism: + dp: 2 + expert_parallel_size: 1 + pp: 1 + pp_engine: 1f1b + tp: 4 + tp_linear_async_communication: true + tp_mode: reduce_scatter +data_stages: + - name: stable training stage + start_training_step: 1 + data: + dataset: + dataset_overwrite_cache: false + dataset_processing_num_proc_per_process: 32 + hf_dataset_config_name: null + hf_dataset_or_datasets: wikitext + hf_dataset_splits: train + text_column_name: text + hf_dataset_config_name: wikitext-103-v1 + num_loading_workers: 1 + seed: 42 +lighteval: null +tokenizer: + tokenizer_max_length: null + tokenizer_name_or_path: gpt2 + tokenizer_revision: null +tokens: + batch_accumulation_per_replica: 1 + limit_test_batches: 0 + limit_val_batches: 0 + micro_batch_size: 64 + sequence_length: 512 + train_steps: 1500 + val_check_interval: -1 +checkpoints: + checkpoint_interval: 1500 + checkpoints_path: checkpoints + checkpoints_path_is_shared_file_system: false + resume_checkpoint_path: checkpoints + save_initial_state: false +profiler: null +logging: + iteration_step_info_interval: 1 + log_level: info + log_level_replica: info +``` + +This configuration file will train, as a proof of concept, a gpt-2-like (109M parameters) llama model with approximately 100M tokens of wikitext with settings `tp=4, dp=2, pp=1` (which means that it requires two nodes to train). This training job will require approximately 10 minutes to run. Now, create a batchfile in `$HOME/nanotron/run_tiny_llama.sh` with the contents: + +```bash title="$HOME/nanotron/run_tiny_llama.sh" +#!/bin/bash +#SBATCH --job-name=nanotron # create a short name for your job +#SBATCH --nodes=2 # total number of nodes +#SBATCH --ntasks-per-node=1 # total number of tasks per node +#SBATCH --gpus-per-task=4 +#SBATCH --time=1:00:00 +#SBATCH --account= +#SBATCH --output=logs/%x_%j.log # control where the stdout will be +#SBATCH --error=logs/%x_%j.err # control where the error messages will be# + +mkdir -p logs + +# Initialization. +set -x +cat $0 +export MASTER_PORT=25678 +export MASTER_ADDR=$(hostname) +export HF_HOME=$SCRATCH/huggingface_home +export CUDA_DEVICE_MAX_CONNECTIONS=1 # required by nanotron +# export either WANDB_API_KEY= or WANDB_MODE=offline + +# Run main script. +srun -ul --environment=nanotron bash -c " + # Change cwd and run the main training script. + cd nanotron/ + pip install -e . # Only required the first time. + + TORCHRUN_ARGS=\" + --node-rank=\${SLURM_PROCID} \ + --master-addr=\${MASTER_ADDR} \ + --master-port=\${MASTER_PORT} \ + --nnodes=\${SLURM_NNODES} \ + --nproc-per-node=\${SLURM_GPUS_PER_TASK} \ + \" + + torchrun \${TORCHRUN_ARGS} run_train.py --config-file examples/config_tiny_llama_wikitext.yaml +" +``` + +A few comments: +- The parts outside the srun command will be run on the first node of the Slurm allocation for this job. srun commands without further specifiers execute with the settings of the sbatch script (i.e. using all nodes allocated to the job). +- If you have a [wandb](https://wandb.ai/) API key and want to synchronize the training run, be sure to set the `WANDB_API_KEY` variable. Otherwise, set `WANDB_MODE=of​f​line` instead. +- Note that we are setting `HF_HOME` in a directory in scratch. This is done to place the downloaded dataset in scratch, instead of your home directory. +- The pip install command is only run once in every container (compute node). Note that this will only link the nanotron python package to be able to import it in any script irrespective of the current working directory. Because all dependencies of nanotron are already installed in the Dockerfile, no extra libraries will be installed at this point. If the installation of the package under development creates artefacts on the shared filesystem (such as binaries from compiled C++/CUDA source code), this results in a race condition when run from multiple nodes. Therefore, in this case and also when additional external libraries are to be installed, you should either use venv as shown in previous tutorials, or directly build everything in the Dockerfile. + +### Launch a Training Job with the new Image + +Run: + +```bash +sbatch run_tiny_llama.sh +``` + +You can inspect if your job has been submitted successfully by running `squeue --me` and looking for your username. Once the run starts, there will be a new file under `logs/`. You can inspect the status of your run using: + +``` +tail -f logs/ +``` + +In the end, the checkpoints of the model will be saved in `checkpoints/`. \ No newline at end of file diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index c657e65d..c43fc7b0 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -71,5 +71,4 @@ Project is per project - each project gets a project folder with project-specifi ## Guides and tutorials -!!! todo - links to tutorials and guides for ML workflows +Tutorials for finetuning and running inference of LLMs as well as training an LLM with Nanotron can be found in the [MLP Tutorials][ref-guides-mlp-tutorials] page. diff --git a/mkdocs.yml b/mkdocs.yml index 5529d83a..db577901 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -97,6 +97,7 @@ nav: - 'Storage': guides/storage.md - 'Using the terminal': guides/terminal.md - 'Gordon Bell 2025': guides/gb2025.md + - 'MLP Tutorials': guides/mlp_tutorials.md - 'Policies': - policies/index.md - 'User Regulations': policies/regulations.md From 2c9e5c5ae38eb4887c7568fe376d965d65d5aba7 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Thu, 10 Apr 2025 15:28:36 +0200 Subject: [PATCH 2/2] typo. --- docs/guides/mlp_tutorials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/mlp_tutorials.md b/docs/guides/mlp_tutorials.md index df143499..2c1df914 100644 --- a/docs/guides/mlp_tutorials.md +++ b/docs/guides/mlp_tutorials.md @@ -1,7 +1,7 @@ [](){#ref-guides-mlp-tutorials} # MLP Tutorials -These tutorials solve simple MLP tasks using the [Container Engine][ref-container-engine] on the ML-Platform. +These tutorials solve simple MLP tasks using the [Container Engine][ref-container-engine] on the ML Platform. 1. [LLM Inference][ref-mlp-llm-inference-tutorial] 2. [LLM Finetuning][ref-mlp-llm-finetuning-tutorial]