Skip to content

Commit e82be5e

Browse files
committed
updates
1 parent 1ed0764 commit e82be5e

File tree

7 files changed

+57
-18
lines changed

7 files changed

+57
-18
lines changed

.dockerignore

+2
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@
33
build_docker.sh
44
Dockerfile
55
local/
6+
README.md
7+
LICENSE

01_hello_sky/01_hello_sky.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@
136136
"\n",
137137
"setup: |\n",
138138
" echo \"Run any setup commands here\"\n",
139-
" sudo apt install cowsay\n",
139+
" pip install cowsay\n",
140140
"\n",
141141
"run: |\n",
142142
" echo \"Hello Stranger!\"\n",

02_using_accelerators/02_using_accelerators.ipynb

+6-4
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
"\n",
107107
"```yaml\n",
108108
"resources:\n",
109-
" accelerators: V100:1\n",
109+
" accelerators: K80:1\n",
110110
"\n",
111111
"setup: ....\n",
112112
"\n",
@@ -120,7 +120,7 @@
120120
"cell_type": "markdown",
121121
"metadata": {},
122122
"source": [
123-
"## <span style=\"color:green\">[DIY]</span> 📝 Edit `bert.yaml` to use a V100 GPU! \n",
123+
"## <span style=\"color:green\">[DIY]</span> 📝 Edit `bert.yaml` to use a K80 GPU! \n",
124124
"\n",
125125
"We have provided an example YAML (`bert.yaml`) which fine-tunes a BERT model on the SQuAD dataset. However, it does not specify any GPU resources for training.\n",
126126
"\n",
@@ -132,7 +132,7 @@
132132
"```yaml\n",
133133
"...\n",
134134
"resources:\n",
135-
" accelerators: V100:1\n",
135+
" accelerators: K80:1\n",
136136
"...\n",
137137
"```\n",
138138
"---------------------"
@@ -202,6 +202,8 @@
202202
"```\n",
203203
"-------------------------\n",
204204
"\n",
205+
"**After you see the task training output, hit `ctrl+c` to exit.**\n",
206+
"\n",
205207
"> **💡 Hint** - For long running tasks, you can safely Ctrl+C to exit once the task has started. It will continue running in the background. For more on how to access logs after detaching, queue more tasks and cancel tasks, please refer to [SkyPilot docs](https://skypilot.readthedocs.io/en/latest/reference/job-queue.html)."
206208
]
207209
},
@@ -241,7 +243,7 @@
241243
"\n",
242244
"(In the interest of time, we don't run this command in this notebook but feel free to try it later!)\n",
243245
"\n",
244-
"SkyPilot will find instance types on GCP that support the required GPU (V100), and it will also mount the object store when the task runs."
246+
"SkyPilot will find instance types on GCP that support the required GPU, and it will also mount the object store when the task runs."
245247
]
246248
},
247249
{

02_using_accelerators/bert.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
name: bert
22

33
resources:
4-
accelerators: # Add V100:1 here!
4+
accelerators: # [DIY] - Add K80:1 here!
55

6-
# For this task, we specify cloud and region for quota reasons.
6+
# For this task, we specify cloud and region because our tutorial account has quota only in the us-west-2 region.
77
# If these are not specified, SkyPilot will try the cheapest region first, and failover if quota is exceeded.
88
cloud: aws
99
region: us-west-2

03_spot_instances/03_spot_instances.ipynb

+6-11
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,13 @@
6262
"source": [
6363
"## <span style=\"color:green\">[DIY]</span> 💻 Train BERT on spot instances with `sky spot launch`!\n",
6464
"\n",
65-
"**Training BERT on spot instances with SkyPilot requires no changes to the previous YAML!**\n",
65+
"**Training BERT on spot instances with SkyPilot requires no changes to the YAML!**\n",
6666
"\n",
6767
"**Simply replace `sky launch` with `sky spot launch` to run the task on spot instances.**\n",
6868
"\n",
6969
"------------------\n",
7070
"```console\n",
71-
"$ sky spot launch 02_using_accelerators/bert.yaml\n",
71+
"$ sky spot launch 03_spot_instances/bert.yaml\n",
7272
"```\n",
7373
"------------------\n",
7474
"\n",
@@ -241,16 +241,11 @@
241241
"\n",
242242
"### Liked SkyPilot?\n",
243243
"* **Give us a star on [github](github.com/skypilot-org/skypilot)!**\n",
244-
"* **Reach out to us on the SkyCamp slack**\n",
245-
"* **Check out the [docs](https://skypilot.readthedocs.io/) to learn about more exciting SkyPilot features, such as automatic benchmarking, automatic instance stopping, TPUs, on-premise support and much more!**\n"
244+
"* **Join us on the [SkyPilot slack](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q).**\n",
245+
"* **Check out the [docs](https://skypilot.readthedocs.io/) to learn about more exciting SkyPilot features, such as automatic benchmarking, automatic instance stopping, TPUs, on-premise support and much more!**\n",
246+
"\n",
247+
"[![SkyPilotSlack](https://i.imgur.com/HLUSHyr.png)](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q)"
246248
]
247-
},
248-
{
249-
"cell_type": "code",
250-
"execution_count": null,
251-
"metadata": {},
252-
"outputs": [],
253-
"source": []
254249
}
255250
],
256251
"metadata": {

03_spot_instances/bert.yaml

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
name: bert
2+
3+
resources:
4+
accelerators: T4:1 # Use T4 GPUs for quota reasons.
5+
cloud: aws
6+
region: us-west-2
7+
8+
# file_mounts specifies the any data that must be made available to the task
9+
file_mounts:
10+
/dataset/: # This specifies the destination where the object bucket will be mounted
11+
source: s3://sky-bert-dataset/ # The bucket URL to be mounted
12+
13+
14+
# Setup repository.
15+
setup: |
16+
git clone https://github.com/huggingface/transformers.git
17+
cd transformers && git checkout v4.18.0
18+
pip install -e .
19+
cd examples/pytorch/question-answering/
20+
pip install -r requirements.txt
21+
22+
# Run command. Note that the --train_file argument reads from the object store mounted at /dataset
23+
run: |
24+
cd transformers/examples/pytorch/question-answering/
25+
python run_qa.py \
26+
--train_file /dataset/train-v2.0.json \
27+
--model_name_or_path bert-base-uncased \
28+
--dataset_name squad \
29+
--do_train \
30+
--do_eval \
31+
--per_device_train_batch_size 12 \
32+
--learning_rate 3e-5 \
33+
--num_train_epochs 50 \
34+
--max_seq_length 384 \
35+
--doc_stride 128 \
36+
--report_to none \
37+
--output_dir /tmp/checkpoints/. \
38+
--save_total_limit 10 \
39+
--save_steps 1000

build_docker.sh

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env bash
2+
set -e
23
# aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/a9w6z7w5
34
docker build -t skypilot-tutorial:latest .
45
docker tag skypilot-tutorial:latest public.ecr.aws/a9w6z7w5/skypilot-tutorial:latest

0 commit comments

Comments
 (0)