@@ -4,7 +4,6 @@ sidebar_position: 1
4
4
5
5
# Quickstart
6
6
7
- Let's discover ** Substratus in less than 5 minutes** .
8
7
Substratus is a cross cloud substrate for training and serving AI models.
9
8
Substratus extends the Kubernetes control plane to orchestrate ML operations
10
9
through the addition of new API endpoints: Model, ModelServer, Dataset,
@@ -21,7 +20,7 @@ controller.
21
20
### What you'll need
22
21
23
22
- [ git] ( https://git-scm.com/book/en/v2/Getting-Started-Installing-Git )
24
- - [ Docker Desktop ] ( https://docs.docker.com/engine/install/ )
23
+ - [ Docker] ( https://docs.docker.com/engine/install/ )
25
24
- A [ Google Cloud Platform] ( https://console.cloud.google.com/ ) project with billing enabled.
26
25
27
26
### Cloning the substratus repo
@@ -31,33 +30,70 @@ git clone https://github.com/substratusai/substratus
31
30
cd substratus
32
31
```
33
32
34
- ### Creating the infrastructure in GCP
33
+ ### Creating the infra and and and and deploying the controller
35
34
36
35
Use our infrastructure build image to create a cluster and dependent cloud
37
36
components:
38
37
39
38
``` bash
40
- docker build ./infra -t substratus-infra && docker run -it \
41
- -e REGION=us-central1 \
42
- -e ZONE=us-central1-a \
43
- -e PROJECT=$( gcloud config get project) \
44
- -e TOKEN=$( gcloud auth print-access-token) \
45
- substratus-infra gcp-up
39
+ docker build ./install -t substratus-installer && \
40
+ docker run -it \
41
+ -v $HOME /.kube:/root/.kube \
42
+ -e PROJECT=$( gcloud config get project) \
43
+ -e TOKEN=$( gcloud auth print-access-token) \
44
+ -e GPU_TYPE=nvidia-l4 \
45
+ substratus-installer gcp-up.sh
46
46
```
47
+ This will create the following infrastructure:
48
+ * GKE cluster with nodepools to be able to run L4 GPUs
49
+ * Artifact Registry Container Repository to store the models
50
+ * GCS Bucket to store fine tuned models
51
+ * GCS Bucket to store terraform state
47
52
48
- ### Deploying the controller
53
+ The Substratus Operator will automatically be installed on the GKE cluster.
49
54
55
+ ### Deploy falcon-7b-instruct model
56
+ Let's build the container image by creating a Model
50
57
``` bash
51
- export GPU_TYPE=nvidia-l4
52
- envsubst intall.yaml.template | kubectl apply -f -
58
+ kubectl apply -f examples/falcon-7b-instruct/model.yaml
53
59
```
54
60
55
- ## Deploying Falcon
61
+ You can inspect the logs of the container image being built by running:
62
+ ``` bash
63
+ kubectl logs -f jobs/falcon-7b-instruct-model-builder
64
+ ```
65
+ Press Ctrl + C to exit watching the logs.
56
66
57
- Define the model CRD for falcon-7b-instruct
67
+ The job should eventually complete after about 11 minutes.
58
68
69
+ You can now deploy an inferencing server by creating a ModelServer:
59
70
``` bash
60
- kubectl apply -f examples/falcon-7b-instruct/model.yaml
71
+ kubectl apply -f examples/falcon-7b-instruct/server.yaml
72
+ ```
73
+
74
+ It takes about 3-4 mintues to load the model into memory.
75
+
76
+ Check the logs
77
+ and wait till you see a line that says ` listening on 0.0.0.0:8080 ` :
78
+ ``` bash
79
+ kubectl logs -f deployment/falcon-7b-instruct-server
80
+ ```
81
+
82
+ Now you can use port forwarding to access the Web UI:
83
+ ``` bash
84
+ kubectl port-forward deployment/falcon-7b-instruct-server 8080:8080
61
85
```
62
86
63
- WIP, more to come
87
+ Try some prompts by visting [ http://localhost:8080 ] ( http://localhost:8080 ) .
88
+
89
+ Side bonus, the inference server provides an OpenAI compatible API endpoint.
90
+ Basaran is the component that provides this. Read more about
91
+ [ Basaran here] ( https://github.com/hyperonym/basaran ) .
92
+
93
+ ## Conclusion and next steps
94
+ You were able to deploy a large language model on GKE and can now use it to
95
+ create private LLM applications.
96
+
97
+ Next steps:
98
+ * Fine tuning a Model with the Dataset API (TODO write doc)
99
+ * Using a notebook to create a new model (TODO write doc)
0 commit comments