Skip to content

Commit 434106c

Browse files
committed
add kuberay integration
Signed-off-by: Ricardo Decal <[email protected]>
1 parent 006ff20 commit 434106c

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: KubeRay
3+
---
4+
[](){ #deployment-kuberay }
5+
6+
[KubeRay](https://github.com/ray-project/kuberay) provides a Kubernetes-native way to run vLLM workloads on Ray clusters.
7+
A Ray cluster can be declared in YAML, and the operator then handles pod scheduling, networking configuration, restarts, and rolling upgrades—all while preserving the familiar Kubernetes experience.
8+
9+
---
10+
11+
## Why KubeRay instead of manual scripts?
12+
13+
| Feature | Manual scripts | KubeRay |
14+
|---------|-----------------------------------------------------------|---------|
15+
| Cluster bootstrap | Manually SSH into every node and run a script | One command to create or update the whole cluster: `kubectl apply -f cluster.yaml` |
16+
| Fault-tolerance | Nodes must be restarted by hand | Pods are automatically rescheduled; head-node fail-over supported |
17+
| Autoscaling | Unsupported | Native horizontal **and** vertical autoscaling via Ray Autoscaler & Kubernetes HPA |
18+
| Upgrades | Tear down & re-create manually | Rolling updates handled by the operator |
19+
| Monitoring | ad-hoc | Distributed observability with Ray Dashboard |
20+
| Declarative config | Bash flags & environment variables | Git-ops-friendly YAML CRDs (RayCluster/RayService) |
21+
22+
Using KubeRay reduces the operational burden and simplifies integration of Ray + vLLM with existing Kubernetes workflows (CI/CD, secrets, storage classes, etc.).
23+
24+
---
25+
26+
## Quick start
27+
28+
1. Install the KubeRay operator (via Helm or `kubectl apply`).
29+
2. Create a `RayService` that runs vLLM.
30+
31+
```bash
32+
# FIXME create this yaml before merging PR
33+
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/master/ray-operator/config/samples/vllm/ray-service.vllm.yaml
34+
```
35+
36+
The YAML above spins up a Ray cluster and a Ray Serve application that serves the
37+
`meta-llama/Meta-Llama-3-8B-Instruct` model using vLLM. Wait until the
38+
`RayService` reports **RUNNING**, then port-forward and query the model:
39+
40+
```bash
41+
kubectl port-forward svc/llama-3-8b-serve-svc 8000 &
42+
43+
curl http://localhost:8000/v1/chat/completions \
44+
-H "Content-Type: application/json" \
45+
-d '{
46+
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
47+
"messages": [
48+
{"role": "system", "content": "You are a helpful assistant."},
49+
{"role": "user", "content": "Provide a brief sentence describing the Ray open-source project."}
50+
],
51+
"temperature": 0.7
52+
}'
53+
```
54+
55+
---
56+
57+
## Learn more
58+
59+
* ["Serve a Large Language Model with vLLM on Kubernetes"](https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html):
60+
End-to-end walkthrough for deploying Llama-3 8B with `RayService`.
61+
* [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html)

0 commit comments

Comments
 (0)