updates

romilbhardwaj · romilbhardwaj · commit e82be5e0ea98 · 2022-10-18T23:18:07.000-07:00
diff --git a/.dockerignore b/.dockerignore
@@ -3,3 +3,5 @@
 build_docker.sh
 Dockerfile
 local/
+README.md
+LICENSE
diff --git a/01_hello_sky/01_hello_sky.ipynb b/01_hello_sky/01_hello_sky.ipynb
@@ -136,7 +136,7 @@
     "\n",
     "setup: |\n",
     "  echo \"Run any setup commands here\"\n",
-    "  sudo apt install cowsay\n",
+    "  pip install cowsay\n",
     "\n",
     "run: |\n",
     "  echo \"Hello Stranger!\"\n",
diff --git a/02_using_accelerators/02_using_accelerators.ipynb b/02_using_accelerators/02_using_accelerators.ipynb
@@ -106,7 +106,7 @@
     "\n",
     "```yaml\n",
     "resources:\n",
-    "  accelerators: V100:1\n",
+    "  accelerators: K80:1\n",
     "\n",
     "setup: ....\n",
     "\n",
@@ -120,7 +120,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <span style=\"color:green\">[DIY]</span> 📝 Edit `bert.yaml` to use a V100 GPU! \n",
+    "## <span style=\"color:green\">[DIY]</span> 📝 Edit `bert.yaml` to use a K80 GPU! \n",
     "\n",
     "We have provided an example YAML (`bert.yaml`) which fine-tunes a BERT model on the SQuAD dataset. However, it does not specify any GPU resources for training.\n",
     "\n",
@@ -132,7 +132,7 @@
     "```yaml\n",
     "...\n",
     "resources:\n",
-    "  accelerators: V100:1\n",
+    "  accelerators: K80:1\n",
     "...\n",
     "```\n",
     "---------------------"
@@ -202,6 +202,8 @@
     "```\n",
     "-------------------------\n",
     "\n",
+    "**After you see the task training output, hit `ctrl+c` to exit.**\n",
+    "\n",
     "> **💡 Hint** - For long running tasks, you can safely Ctrl+C to exit once the task has started. It will continue running in the background. For more on how to access logs after detaching, queue more tasks and cancel tasks, please refer to [SkyPilot docs](https://skypilot.readthedocs.io/en/latest/reference/job-queue.html)."
    ]
   },
@@ -241,7 +243,7 @@
     "\n",
     "(In the interest of time, we don't run this command in this notebook but feel free to try it later!)\n",
     "\n",
-    "SkyPilot will find instance types on GCP that support the required GPU (V100), and it will also mount the object store when the task runs."
+    "SkyPilot will find instance types on GCP that support the required GPU, and it will also mount the object store when the task runs."
    ]
   },
   {
diff --git a/02_using_accelerators/bert.yaml b/02_using_accelerators/bert.yaml
@@ -1,9 +1,9 @@
 name: bert
 
 resources:
-  accelerators: # Add V100:1 here!
+  accelerators: # [DIY] - Add K80:1 here!
 
-  # For this task, we specify cloud and region for quota reasons.
+  # For this task, we specify cloud and region because our tutorial account has quota only in the us-west-2 region.
   # If these are not specified, SkyPilot will try the cheapest region first, and failover if quota is exceeded.
   cloud: aws
   region: us-west-2
diff --git a/03_spot_instances/03_spot_instances.ipynb b/03_spot_instances/03_spot_instances.ipynb
@@ -62,13 +62,13 @@
    "source": [
     "## <span style=\"color:green\">[DIY]</span> 💻 Train BERT on spot instances with `sky spot launch`!\n",
     "\n",
-    "**Training BERT on spot instances with SkyPilot requires no changes to the previous YAML!**\n",
+    "**Training BERT on spot instances with SkyPilot requires no changes to the YAML!**\n",
     "\n",
     "**Simply replace `sky launch` with `sky spot launch` to run the task on spot instances.**\n",
     "\n",
     "------------------\n",
     "```console\n",
-    "$ sky spot launch 02_using_accelerators/bert.yaml\n",
+    "$ sky spot launch 03_spot_instances/bert.yaml\n",
     "```\n",
     "------------------\n",
     "\n",
@@ -241,16 +241,11 @@
     "\n",
     "### Liked SkyPilot?\n",
     "* **Give us a star on [github](github.com/skypilot-org/skypilot)!**\n",
-    "* **Reach out to us on the SkyCamp slack**\n",
-    "* **Check out the [docs](https://skypilot.readthedocs.io/) to learn about more exciting SkyPilot features, such as automatic benchmarking, automatic instance stopping, TPUs, on-premise support and much more!**\n"
+    "* **Join us on the [SkyPilot slack](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q).**\n",
+    "* **Check out the [docs](https://skypilot.readthedocs.io/) to learn about more exciting SkyPilot features, such as automatic benchmarking, automatic instance stopping, TPUs, on-premise support and much more!**\n",
+    "\n",
+    "[![SkyPilotSlack](https://i.imgur.com/HLUSHyr.png)](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/03_spot_instances/bert.yaml b/03_spot_instances/bert.yaml
@@ -0,0 +1,39 @@
+name: bert
+
+resources:
+  accelerators: T4:1  # Use T4 GPUs for quota reasons.
+  cloud: aws
+  region: us-west-2
+
+# file_mounts specifies the any data that must be made available to the task
+file_mounts:
+  /dataset/: # This specifies the destination where the object bucket will be mounted
+    source: s3://sky-bert-dataset/ # The bucket URL to be mounted
+
+
+# Setup repository.
+setup: |
+  git clone https://github.com/huggingface/transformers.git
+  cd transformers && git checkout v4.18.0
+  pip install -e .
+  cd examples/pytorch/question-answering/
+  pip install -r requirements.txt
+
+# Run command. Note that the --train_file argument reads from the object store mounted at /dataset
+run: |
+  cd transformers/examples/pytorch/question-answering/
+  python run_qa.py \
+  --train_file /dataset/train-v2.0.json \
+  --model_name_or_path bert-base-uncased \
+  --dataset_name squad \
+  --do_train \
+  --do_eval \
+  --per_device_train_batch_size 12 \
+  --learning_rate 3e-5 \
+  --num_train_epochs 50 \
+  --max_seq_length 384 \
+  --doc_stride 128 \
+  --report_to none \
+  --output_dir /tmp/checkpoints/. \
+  --save_total_limit 10 \
+  --save_steps 1000
diff --git a/build_docker.sh b/build_docker.sh
@@ -1,4 +1,5 @@
 #!/usr/bin/env bash
+set -e
 # aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/a9w6z7w5
 docker build -t skypilot-tutorial:latest .
 docker tag skypilot-tutorial:latest public.ecr.aws/a9w6z7w5/skypilot-tutorial:latest

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`#!/usr/bin/env bash`
	`2`	`+set -e`
`2`	`3`	`# aws ecr-public get-login-password --region us-east-1 \| docker login --username AWS --password-stdin public.ecr.aws/a9w6z7w5`
`3`	`4`	`docker build -t skypilot-tutorial:latest .`
`4`	`5`	`docker tag skypilot-tutorial:latest public.ecr.aws/a9w6z7w5/skypilot-tutorial:latest`