Skip to content

Commit d6847fb

Browse files
committed
[Docs] Fix syntax highlighting of shell commands
Signed-off-by: Lukas Geiger <[email protected]>
1 parent b82e0f8 commit d6847fb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+220
-220
lines changed

.buildkite/nightly-benchmarks/nightly-annotation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Please download the visualization scripts in the post
1616
- Download `nightly-benchmarks.zip`.
1717
- In the same folder, run the following code:
1818

19-
```console
19+
```bash
2020
export HF_TOKEN=<your HF token>
2121
apt update
2222
apt install -y git

docs/deployment/docker.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ title: Using Docker
1010
vLLM offers an official Docker image for deployment.
1111
The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai](https://hub.docker.com/r/vllm/vllm-openai/tags).
1212

13-
```console
13+
```bash
1414
docker run --runtime nvidia --gpus all \
1515
-v ~/.cache/huggingface:/root/.cache/huggingface \
1616
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
@@ -22,7 +22,7 @@ docker run --runtime nvidia --gpus all \
2222

2323
This image can also be used with other container engines such as [Podman](https://podman.io/).
2424

25-
```console
25+
```bash
2626
podman run --gpus all \
2727
-v ~/.cache/huggingface:/root/.cache/huggingface \
2828
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
@@ -71,7 +71,7 @@ You can add any other [engine-args][engine-args] you need after the image tag (`
7171

7272
You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
7373

74-
```console
74+
```bash
7575
# optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
7676
DOCKER_BUILDKIT=1 docker build . \
7777
--target vllm-openai \
@@ -99,7 +99,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
9999

100100
??? Command
101101

102-
```console
102+
```bash
103103
# Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
104104
python3 use_existing_torch.py
105105
DOCKER_BUILDKIT=1 docker build . \
@@ -118,7 +118,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
118118

119119
Run the following command on your host machine to register QEMU user static handlers:
120120

121-
```console
121+
```bash
122122
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
123123
```
124124

@@ -128,7 +128,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
128128

129129
To run vLLM with the custom-built Docker image:
130130

131-
```console
131+
```bash
132132
docker run --runtime nvidia --gpus all \
133133
-v ~/.cache/huggingface:/root/.cache/huggingface \
134134
-p 8000:8000 \

docs/deployment/frameworks/anything-llm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
1515

1616
- Start the vLLM server with the supported chat completion model, e.g.
1717

18-
```console
18+
```bash
1919
vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
2020
```
2121

docs/deployment/frameworks/autogen.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ title: AutoGen
1111

1212
- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
1313

14-
```console
14+
```bash
1515
pip install vllm
1616

1717
# Install AgentChat and OpenAI client from Extensions
@@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
2323

2424
- Start the vLLM server with the supported chat completion model, e.g.
2525

26-
```console
26+
```bash
2727
python -m vllm.entrypoints.openai.api_server \
2828
--model mistralai/Mistral-7B-Instruct-v0.2
2929
```

docs/deployment/frameworks/cerebrium.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr
1111

1212
To install the Cerebrium client, run:
1313

14-
```console
14+
```bash
1515
pip install cerebrium
1616
cerebrium login
1717
```
1818

1919
Next, create your Cerebrium project, run:
2020

21-
```console
21+
```bash
2222
cerebrium init vllm-project
2323
```
2424

@@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr
5858

5959
Then, run the following code to deploy it to the cloud:
6060

61-
```console
61+
```bash
6262
cerebrium deploy
6363
```
6464

docs/deployment/frameworks/chatbox.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
1515

1616
- Start the vLLM server with the supported chat completion model, e.g.
1717

18-
```console
18+
```bash
1919
vllm serve qwen/Qwen1.5-0.5B-Chat
2020
```
2121

docs/deployment/frameworks/dify.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.
1818

1919
- Start the vLLM server with the supported chat completion model, e.g.
2020

21-
```console
21+
```bash
2222
vllm serve Qwen/Qwen1.5-7B-Chat
2323
```
2424

2525
- Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
2626

27-
```console
27+
```bash
2828
git clone https://github.com/langgenius/dify.git
2929
cd dify
3030
cd docker

docs/deployment/frameworks/dstack.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),
1111

1212
To install dstack client, run:
1313

14-
```console
14+
```bash
1515
pip install "dstack[all]
1616
dstack server
1717
```
1818
1919
Next, to configure your dstack project, run:
2020
21-
```console
21+
```bash
2222
mkdir -p vllm-dstack
2323
cd vllm-dstack
2424
dstack init

docs/deployment/frameworks/haystack.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
1313

1414
- Setup vLLM and Haystack environment
1515

16-
```console
16+
```bash
1717
pip install vllm haystack-ai
1818
```
1919

2020
## Deploy
2121

2222
- Start the vLLM server with the supported chat completion model, e.g.
2323

24-
```console
24+
```bash
2525
vllm serve mistralai/Mistral-7B-Instruct-v0.1
2626
```
2727

docs/deployment/frameworks/helm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,15 @@ Before you begin, ensure that you have the following:
2222

2323
To install the chart with the release name `test-vllm`:
2424

25-
```console
25+
```bash
2626
helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
2727
```
2828

2929
## Uninstalling the Chart
3030

3131
To uninstall the `test-vllm` deployment:
3232

33-
```console
33+
```bash
3434
helm uninstall test-vllm --namespace=ns-vllm
3535
```
3636

docs/deployment/frameworks/litellm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ And LiteLLM supports all models on VLLM.
1818

1919
- Setup vLLM and litellm environment
2020

21-
```console
21+
```bash
2222
pip install vllm litellm
2323
```
2424

@@ -28,7 +28,7 @@ pip install vllm litellm
2828

2929
- Start the vLLM server with the supported chat completion model, e.g.
3030

31-
```console
31+
```bash
3232
vllm serve qwen/Qwen1.5-0.5B-Chat
3333
```
3434

@@ -56,7 +56,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
5656

5757
- Start the vLLM server with the supported embedding model, e.g.
5858

59-
```console
59+
```bash
6060
vllm serve BAAI/bge-base-en-v1.5
6161
```
6262

docs/deployment/frameworks/open-webui.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ title: Open WebUI
77

88
2. Start the vLLM server with the supported chat completion model, e.g.
99

10-
```console
10+
```bash
1111
vllm serve qwen/Qwen1.5-0.5B-Chat
1212
```
1313

1414
1. Start the [Open WebUI](https://github.com/open-webui/open-webui) docker container (replace the vllm serve host and vllm serve port):
1515

16-
```console
16+
```bash
1717
docker run -d -p 3000:8080 \
1818
--name open-webui \
1919
-v open-webui:/app/backend/data \

docs/deployment/frameworks/retrieval_augmented_generation.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Here are the integrations:
1515

1616
- Setup vLLM and langchain environment
1717

18-
```console
18+
```bash
1919
pip install -U vllm \
2020
langchain_milvus langchain_openai \
2121
langchain_community beautifulsoup4 \
@@ -26,14 +26,14 @@ pip install -U vllm \
2626

2727
- Start the vLLM server with the supported embedding model, e.g.
2828

29-
```console
29+
```bash
3030
# Start embedding service (port 8000)
3131
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
3232
```
3333

3434
- Start the vLLM server with the supported chat completion model, e.g.
3535

36-
```console
36+
```bash
3737
# Start chat service (port 8001)
3838
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
3939
```
@@ -52,7 +52,7 @@ python retrieval_augmented_generation_with_langchain.py
5252

5353
- Setup vLLM and llamaindex environment
5454

55-
```console
55+
```bash
5656
pip install vllm \
5757
llama-index llama-index-readers-web \
5858
llama-index-llms-openai-like \
@@ -64,14 +64,14 @@ pip install vllm \
6464

6565
- Start the vLLM server with the supported embedding model, e.g.
6666

67-
```console
67+
```bash
6868
# Start embedding service (port 8000)
6969
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
7070
```
7171

7272
- Start the vLLM server with the supported chat completion model, e.g.
7373

74-
```console
74+
```bash
7575
# Start chat service (port 8001)
7676
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
7777
```

docs/deployment/frameworks/skypilot.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
1515
- Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
1616
- Check that `sky check` shows clouds or Kubernetes are enabled.
1717

18-
```console
18+
```bash
1919
pip install skypilot-nightly
2020
sky check
2121
```
@@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil
7171

7272
Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):
7373

74-
```console
74+
```bash
7575
HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
7676
```
7777

@@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the
8383

8484
**Optional**: Serve the 70B model instead of the default 8B and use more GPU:
8585

86-
```console
86+
```bash
8787
HF_TOKEN="your-huggingface-token" \
8888
sky launch serving.yaml \
8989
--gpus A100:8 \
@@ -159,15 +159,15 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut
159159

160160
Start the serving the Llama-3 8B model on multiple replicas:
161161

162-
```console
162+
```bash
163163
HF_TOKEN="your-huggingface-token" \
164164
sky serve up -n vllm serving.yaml \
165165
--env HF_TOKEN
166166
```
167167

168168
Wait until the service is ready:
169169

170-
```console
170+
```bash
171171
watch -n10 sky serve status vllm
172172
```
173173

@@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.
271271
272272
To update the service with the new config:
273273
274-
```console
274+
```bash
275275
HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
276276
```
277277

278278
To stop the service:
279279

280-
```console
280+
```bash
281281
sky serve down vllm
282282
```
283283

@@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,
317317

318318
1. Start the chat web UI:
319319

320-
```console
320+
```bash
321321
sky launch \
322322
-c gui ./gui.yaml \
323323
--env ENDPOINT=$(sky serve status --endpoint vllm)

docs/deployment/frameworks/streamlit.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,21 +15,21 @@ It can be quickly integrated with vLLM as a backend API server, enabling powerfu
1515

1616
- Start the vLLM server with the supported chat completion model, e.g.
1717

18-
```console
18+
```bash
1919
vllm serve qwen/Qwen1.5-0.5B-Chat
2020
```
2121

2222
- Install streamlit and openai:
2323

24-
```console
24+
```bash
2525
pip install streamlit openai
2626
```
2727

2828
- Use the script: <gh-file:examples/online_serving/streamlit_openai_chatbot_webserver.py>
2929

3030
- Start the streamlit web UI and start to chat:
3131

32-
```console
32+
```bash
3333
streamlit run streamlit_openai_chatbot_webserver.py
3434

3535
# or specify the VLLM_API_BASE or VLLM_API_KEY

docs/deployment/integrations/llamastack.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ vLLM is also available via [Llama Stack](https://github.com/meta-llama/llama-sta
77

88
To install Llama Stack, run
99

10-
```console
10+
```bash
1111
pip install llama-stack -q
1212
```
1313

0 commit comments

Comments
 (0)