update quickstart (substratusai#3)

samos123 · web-flow · commit 05334eb71309 · 2023-06-26T07:10:33.000-07:00
diff --git a/docs/intro.md b/docs/intro.md
@@ -4,7 +4,6 @@ sidebar_position: 1
 
 # Quickstart
 
-Let's discover **Substratus in less than 5 minutes**.
 Substratus is a cross cloud substrate for training and serving AI models.
 Substratus extends the Kubernetes control plane to orchestrate ML operations
 through the addition of new API endpoints: Model, ModelServer, Dataset,
@@ -21,7 +20,7 @@ controller.
 ### What you'll need
 
 - [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
-- [Docker Desktop](https://docs.docker.com/engine/install/)
+- [Docker](https://docs.docker.com/engine/install/)
 - A [Google Cloud Platform](https://console.cloud.google.com/) project with billing enabled.
 
 ### Cloning the substratus repo
@@ -31,33 +30,70 @@ git clone https://github.com/substratusai/substratus
 cd substratus
 ```
 
-### Creating the infrastructure in GCP
+### Creating the infra and and and and deploying the controller
 
 Use our infrastructure build image to create a cluster and dependent cloud
 components:
 
 ```bash
-docker build ./infra -t substratus-infra && docker run -it \
-    -e REGION=us-central1 \
-    -e ZONE=us-central1-a \
-    -e PROJECT=$(gcloud config get project) \
-    -e TOKEN=$(gcloud auth print-access-token) \
-    substratus-infra gcp-up
+docker build ./install -t substratus-installer && \
+docker run -it \
+  -v $HOME/.kube:/root/.kube \
+  -e PROJECT=$(gcloud config get project) \
+  -e TOKEN=$(gcloud auth print-access-token) \
+  -e GPU_TYPE=nvidia-l4 \
+  substratus-installer gcp-up.sh
 ```
+This will create the following infrastructure:
+* GKE cluster with nodepools to be able to run L4 GPUs
+* Artifact Registry Container Repository to store the models
+* GCS Bucket to store fine tuned models
+* GCS Bucket to store terraform state
 
-### Deploying the controller
+The Substratus Operator will automatically be installed on the GKE cluster.
 
+### Deploy falcon-7b-instruct model
+Let's build the container image by creating a Model 
 ```bash
-export GPU_TYPE=nvidia-l4
-envsubst intall.yaml.template | kubectl apply -f -
+kubectl apply -f examples/falcon-7b-instruct/model.yaml
 ```
 
-## Deploying Falcon
+You can inspect the logs of the container image being built by running:
+```bash
+kubectl logs -f jobs/falcon-7b-instruct-model-builder
+```
+Press Ctrl + C to exit watching the logs.
 
-Define the model CRD for falcon-7b-instruct
+The job should eventually complete after about 11 minutes.
 
+You can now deploy an inferencing server by creating a ModelServer:
 ```bash
-kubectl apply -f examples/falcon-7b-instruct/model.yaml
+kubectl apply -f examples/falcon-7b-instruct/server.yaml
+```
+
+It takes about 3-4 mintues to load the model into memory.
+
+Check the logs
+and wait till you see a line that says `listening on 0.0.0.0:8080`:
+```bash
+kubectl logs -f deployment/falcon-7b-instruct-server
+```
+
+Now you can use port forwarding to access the Web UI:
+```bash
+kubectl port-forward deployment/falcon-7b-instruct-server 8080:8080
 ```
 
-WIP, more to come
+Try some prompts by visting [http://localhost:8080](http://localhost:8080).
+
+Side bonus, the inference server provides an OpenAI compatible API endpoint.
+Basaran is the component that provides this. Read more about
+[Basaran here](https://github.com/hyperonym/basaran).
+
+## Conclusion and next steps
+You were able to deploy a large language model on GKE and can now use it to
+create private LLM applications.
+
+Next steps:
+* Fine tuning a Model with the Dataset API (TODO write doc)
+* Using a notebook to create a new model (TODO write doc)