Skip to content

Commit f17aec0

Browse files
reidliu41reidliu41
andauthored
[doc] Fold long code blocks to improve readability (#19926)
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
1 parent 493c275 commit f17aec0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+3687
-3412
lines changed

docs/ci/update_pytorch_version.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ source to unblock the update process.
9191
### FlashInfer
9292
Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
9393

94-
```
94+
```bash
9595
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
9696
export FLASHINFER_ENABLE_SM90=1
9797
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/[email protected]"
@@ -105,14 +105,14 @@ team if you want to get the package published there.
105105
### xFormers
106106
Similar to FlashInfer, here is how to build and install xFormers from source:
107107

108-
```
108+
```bash
109109
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
110110
MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/[email protected]"
111111
```
112112

113113
### Mamba
114114

115-
```
115+
```bash
116116
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/[email protected]"
117117
```
118118

docs/cli/README.md

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -16,35 +16,33 @@ vllm {chat,complete,serve,bench,collect-env,run-batch}
1616

1717
Start the vLLM OpenAI Compatible API server.
1818

19-
Examples:
19+
??? Examples
2020

21-
```bash
22-
# Start with a model
23-
vllm serve meta-llama/Llama-2-7b-hf
21+
```bash
22+
# Start with a model
23+
vllm serve meta-llama/Llama-2-7b-hf
2424

25-
# Specify the port
26-
vllm serve meta-llama/Llama-2-7b-hf --port 8100
25+
# Specify the port
26+
vllm serve meta-llama/Llama-2-7b-hf --port 8100
2727

28-
# Check with --help for more options
29-
# To list all groups
30-
vllm serve --help=listgroup
28+
# Check with --help for more options
29+
# To list all groups
30+
vllm serve --help=listgroup
3131

32-
# To view a argument group
33-
vllm serve --help=ModelConfig
32+
# To view a argument group
33+
vllm serve --help=ModelConfig
3434

35-
# To view a single argument
36-
vllm serve --help=max-num-seqs
35+
# To view a single argument
36+
vllm serve --help=max-num-seqs
3737

38-
# To search by keyword
39-
vllm serve --help=max
40-
```
38+
# To search by keyword
39+
vllm serve --help=max
40+
```
4141

4242
## chat
4343

4444
Generate chat completions via the running API server.
4545

46-
Examples:
47-
4846
```bash
4947
# Directly connect to localhost API without arguments
5048
vllm chat
@@ -60,8 +58,6 @@ vllm chat --quick "hi"
6058

6159
Generate text completions based on the given prompt via the running API server.
6260

63-
Examples:
64-
6561
```bash
6662
# Directly connect to localhost API without arguments
6763
vllm complete
@@ -73,6 +69,8 @@ vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
7369
vllm complete --quick "The future of AI is"
7470
```
7571

72+
</details>
73+
7674
## bench
7775

7876
Run benchmark tests for latency online serving throughput and offline inference throughput.
@@ -89,8 +87,6 @@ vllm bench {latency, serve, throughput}
8987

9088
Benchmark the latency of a single batch of requests.
9189

92-
Example:
93-
9490
```bash
9591
vllm bench latency \
9692
--model meta-llama/Llama-3.2-1B-Instruct \
@@ -104,8 +100,6 @@ vllm bench latency \
104100

105101
Benchmark the online serving throughput.
106102

107-
Example:
108-
109103
```bash
110104
vllm bench serve \
111105
--model meta-llama/Llama-3.2-1B-Instruct \
@@ -120,8 +114,6 @@ vllm bench serve \
120114

121115
Benchmark offline inference throughput.
122116

123-
Example:
124-
125117
```bash
126118
vllm bench throughput \
127119
--model meta-llama/Llama-3.2-1B-Instruct \
@@ -143,7 +135,8 @@ vllm collect-env
143135

144136
Run batch prompts and write results to file.
145137

146-
Examples:
138+
<details>
139+
<summary>Examples</summary>
147140

148141
```bash
149142
# Running with a local file
@@ -159,6 +152,8 @@ vllm run-batch \
159152
--model meta-llama/Meta-Llama-3-8B-Instruct
160153
```
161154

155+
</details>
156+
162157
## More Help
163158

164159
For detailed options of any subcommand, use:

docs/configuration/conserving_memory.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -57,19 +57,21 @@ By default, we optimize model inference using CUDA graphs which take up extra me
5757

5858
You can adjust `compilation_config` to achieve a better balance between inference speed and memory usage:
5959

60-
```python
61-
from vllm import LLM
62-
from vllm.config import CompilationConfig, CompilationLevel
63-
64-
llm = LLM(
65-
model="meta-llama/Llama-3.1-8B-Instruct",
66-
compilation_config=CompilationConfig(
67-
level=CompilationLevel.PIECEWISE,
68-
# By default, it goes up to max_num_seqs
69-
cudagraph_capture_sizes=[1, 2, 4, 8, 16],
70-
),
71-
)
72-
```
60+
??? Code
61+
62+
```python
63+
from vllm import LLM
64+
from vllm.config import CompilationConfig, CompilationLevel
65+
66+
llm = LLM(
67+
model="meta-llama/Llama-3.1-8B-Instruct",
68+
compilation_config=CompilationConfig(
69+
level=CompilationLevel.PIECEWISE,
70+
# By default, it goes up to max_num_seqs
71+
cudagraph_capture_sizes=[1, 2, 4, 8, 16],
72+
),
73+
)
74+
```
7375

7476
You can disable graph capturing completely via the `enforce_eager` flag:
7577

@@ -127,18 +129,20 @@ reduce the size of the processed multi-modal inputs, which in turn saves memory.
127129

128130
Here are some examples:
129131

130-
```python
131-
from vllm import LLM
132+
??? Code
132133

133-
# Available for Qwen2-VL series models
134-
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
135-
mm_processor_kwargs={
136-
"max_pixels": 768 * 768, # Default is 1280 * 28 * 28
137-
})
138-
139-
# Available for InternVL series models
140-
llm = LLM(model="OpenGVLab/InternVL2-2B",
141-
mm_processor_kwargs={
142-
"max_dynamic_patch": 4, # Default is 12
143-
})
144-
```
134+
```python
135+
from vllm import LLM
136+
137+
# Available for Qwen2-VL series models
138+
llm = LLM(model="Qwen/Qwen2.5-VL-3B-Instruct",
139+
mm_processor_kwargs={
140+
"max_pixels": 768 * 768, # Default is 1280 * 28 * 28
141+
})
142+
143+
# Available for InternVL series models
144+
llm = LLM(model="OpenGVLab/InternVL2-2B",
145+
mm_processor_kwargs={
146+
"max_dynamic_patch": 4, # Default is 12
147+
})
148+
```

docs/configuration/env_vars.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ vLLM uses the following environment variables to configure the system:
77

88
All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
99

10-
```python
11-
--8<-- "vllm/envs.py:env-vars-definition"
12-
```
10+
??? Code
11+
12+
```python
13+
--8<-- "vllm/envs.py:env-vars-definition"
14+
```

docs/contributing/README.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -93,25 +93,27 @@ For additional features and advanced configurations, refer to the official [MkDo
9393

9494
## Testing
9595

96-
```bash
97-
pip install -r requirements/dev.txt
96+
??? note "Commands"
9897

99-
# Linting, formatting and static type checking
100-
pre-commit install --hook-type pre-commit --hook-type commit-msg
98+
```bash
99+
pip install -r requirements/dev.txt
101100

102-
# You can manually run pre-commit with
103-
pre-commit run --all-files
101+
# Linting, formatting and static type checking
102+
pre-commit install --hook-type pre-commit --hook-type commit-msg
104103

105-
# To manually run something from CI that does not run
106-
# locally by default, you can run:
107-
pre-commit run mypy-3.9 --hook-stage manual --all-files
104+
# You can manually run pre-commit with
105+
pre-commit run --all-files
108106

109-
# Unit tests
110-
pytest tests/
107+
# To manually run something from CI that does not run
108+
# locally by default, you can run:
109+
pre-commit run mypy-3.9 --hook-stage manual --all-files
111110

112-
# Run tests for a single test file with detailed output
113-
pytest -s -v tests/test_logger.py
114-
```
111+
# Unit tests
112+
pytest tests/
113+
114+
# Run tests for a single test file with detailed output
115+
pytest -s -v tests/test_logger.py
116+
```
115117

116118
!!! tip
117119
Since the <gh-file:docker/Dockerfile> ships with Python 3.12, all tests in CI (except `mypy`) are run with Python 3.12.

docs/contributing/model/basic.md

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,33 +27,35 @@ All vLLM modules within the model must include a `prefix` argument in their cons
2727

2828
The initialization code should look like this:
2929

30-
```python
31-
from torch import nn
32-
from vllm.config import VllmConfig
33-
from vllm.attention import Attention
34-
35-
class MyAttention(nn.Module):
36-
def __init__(self, vllm_config: VllmConfig, prefix: str):
37-
super().__init__()
38-
self.attn = Attention(prefix=f"{prefix}.attn")
39-
40-
class MyDecoderLayer(nn.Module):
41-
def __init__(self, vllm_config: VllmConfig, prefix: str):
42-
super().__init__()
43-
self.self_attn = MyAttention(prefix=f"{prefix}.self_attn")
44-
45-
class MyModel(nn.Module):
46-
def __init__(self, vllm_config: VllmConfig, prefix: str):
47-
super().__init__()
48-
self.layers = nn.ModuleList(
49-
[MyDecoderLayer(vllm_config, prefix=f"{prefix}.layers.{i}") for i in range(vllm_config.model_config.hf_config.num_hidden_layers)]
50-
)
51-
52-
class MyModelForCausalLM(nn.Module):
53-
def __init__(self, vllm_config: VllmConfig, prefix: str = ""):
54-
super().__init__()
55-
self.model = MyModel(vllm_config, prefix=f"{prefix}.model")
56-
```
30+
??? Code
31+
32+
```python
33+
from torch import nn
34+
from vllm.config import VllmConfig
35+
from vllm.attention import Attention
36+
37+
class MyAttention(nn.Module):
38+
def __init__(self, vllm_config: VllmConfig, prefix: str):
39+
super().__init__()
40+
self.attn = Attention(prefix=f"{prefix}.attn")
41+
42+
class MyDecoderLayer(nn.Module):
43+
def __init__(self, vllm_config: VllmConfig, prefix: str):
44+
super().__init__()
45+
self.self_attn = MyAttention(prefix=f"{prefix}.self_attn")
46+
47+
class MyModel(nn.Module):
48+
def __init__(self, vllm_config: VllmConfig, prefix: str):
49+
super().__init__()
50+
self.layers = nn.ModuleList(
51+
[MyDecoderLayer(vllm_config, prefix=f"{prefix}.layers.{i}") for i in range(vllm_config.model_config.hf_config.num_hidden_layers)]
52+
)
53+
54+
class MyModelForCausalLM(nn.Module):
55+
def __init__(self, vllm_config: VllmConfig, prefix: str = ""):
56+
super().__init__()
57+
self.model = MyModel(vllm_config, prefix=f"{prefix}.model")
58+
```
5759

5860
### Computation Code
5961

0 commit comments

Comments
 (0)