You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 01_hello_sky/01_hello_sky.ipynb
+49-36
Original file line number
Diff line number
Diff line change
@@ -28,16 +28,16 @@
28
28
}
29
29
},
30
30
"source": [
31
-
"SkyPilot is a framework for running machine learning workloads on any cloud.\n",
31
+
"SkyPilot is a framework for easily running machine learning workloads on any cloud.\n",
32
32
"\n",
33
-
"SkyPilot makes it easy to use multiple clouds and reduce your cloud costs.\n",
33
+
"Use the clouds **easily** and **cost effectively**, without needing cloud infra expertise.\n",
34
34
"\n",
35
-
"_Ease of use & productivity_\n",
35
+
"_Ease of use_\n",
36
36
"* **Run existing projects on the cloud** with zero code changes\n",
37
-
"* **Easily manage jobs** across multiple clusters\n",
38
-
"* **Automatic fail-over** to find scarce resources (GPUs) across regions and clouds\n",
39
-
"* **Store datasets on the cloud** and access them like you would on a local file system \n",
40
-
"* **No cloud lock-in** – seamlessly run your code across different cloud providers (AWS, Azure or GCP)\n",
37
+
"* Use a **unified interface** to run on any cloud, without vendor lock-in (currently AWS, Azure, GCP)\n",
38
+
"* **Queue jobs** on one or multiple clusters\n",
39
+
"* **Automatic failover** to find scarce resources (GPUs) across regions and clouds\n",
40
+
"* **Use datasets on the cloud** like you would on a local file system \n",
41
41
"\n",
42
42
"_Cost saving_\n",
43
43
"* Run jobs on **spot instances** with **automatic recovery** from preemptions\n",
@@ -57,7 +57,8 @@
57
57
"1. Understand the basic SkyPilot YAML interface (`setup`, `run`).\n",
58
58
"2. Run a hello world task on a cloud of your choice.\n",
59
59
"3. SSH into your cluster for debugging and development.\n",
60
-
"4. Terminate the cluster and understand the cluster lifecycle."
60
+
"4. Terminate the cluster and understand the cluster lifecycle.\n",
61
+
"5. Run your task seamlessly across different clouds."
61
62
]
62
63
},
63
64
{
@@ -75,8 +76,8 @@
75
76
"\n",
76
77
"There are points in these notebooks where you may need to edit files outside the notebook and open a terminal to run some commands. These points will be highlighted with **two icons**:\n",
77
78
"\n",
78
-
"### 📝 - Edit an external file\n",
79
-
"### 💻 - Run commands in an interactive terminal window\n",
79
+
"### <span style=\"color:green\">[DIY]</span> 📝 - Edit an external file\n",
80
+
"### <span style=\"color:green\">[DIY]</span> 💻 - Run commands in an interactive terminal window\n",
80
81
"\n",
81
82
"Use these icons as a hint to know when to switch away from the current notebook and edit a file or open a terminal.\n",
82
83
"\n",
@@ -154,8 +155,8 @@
154
155
"cell_type": "markdown",
155
156
"metadata": {},
156
157
"source": [
157
-
"## 📝 Edit `example.yaml` to echo \"Hello SkyPilot\"\n",
158
-
"Go ahead and open example.yaml and edit the run field to echo \"Hello SkyPilot\"."
158
+
"## <span style=\"color:green\">[DIY]</span> 📝 Edit `example.yaml` to echo \"Hello SkyPilot\"\n",
159
+
"**Go ahead and open example.yaml and edit the run field to echo \"Hello SkyPilot\".**"
159
160
]
160
161
},
161
162
{
@@ -176,16 +177,18 @@
176
177
"cell_type": "markdown",
177
178
"metadata": {},
178
179
"source": [
179
-
"## 💻 Launch your Sky Task!\n",
180
+
"## <span style=\"color:green\">[DIY]</span> 💻 Launch your Sky Task!\n",
180
181
"\n",
181
-
"In a terminal window, run:\n",
182
+
"**In a terminal window, run:**\n",
182
183
"\n",
183
184
"-------------------------\n",
184
185
"```console\n",
185
186
"sky launch 01_hello_sky/example.yaml\n",
186
187
"```\n",
187
188
"-------------------------\n",
188
189
"\n",
190
+
"This will take about a minute to run.\n",
191
+
"\n",
189
192
"> **💡 Hint** - If you're using jupyter lab, you can create a terminal in your browser by going to `File -> New -> Terminal`\n",
190
193
"\n",
191
194
"You'll notice that SkyPilot will perform multiple actions for you:\n",
@@ -265,8 +268,9 @@
265
268
"cell_type": "markdown",
266
269
"metadata": {},
267
270
"source": [
268
-
"## 💻 Checking your cluster status with `sky status`\n",
269
-
"In a terminal window, run:\n",
271
+
"## <span style=\"color:green\">[DIY]</span> 💻 Checking your cluster status with `sky status`\n",
272
+
"\n",
273
+
"**In a terminal window, run:**\n",
270
274
"\n",
271
275
"\n",
272
276
"-------------------------\n",
@@ -302,14 +306,16 @@
302
306
"cell_type": "markdown",
303
307
"metadata": {},
304
308
"source": [
305
-
"## 💻 SSH into the cluster!"
309
+
"## <span style=\"color:green\">[DIY]</span> 💻 SSH into the cluster!"
306
310
]
307
311
},
308
312
{
309
313
"cell_type": "markdown",
310
314
"metadata": {},
311
315
"source": [
312
-
"For debugging and development, you can easily SSH into a SkyPilot cluster with the `ssh` utility. In a terminal window, run:\n",
316
+
"For debugging and development, you can easily SSH into a SkyPilot cluster with the `ssh` utility. \n",
317
+
"\n",
318
+
"**In a terminal window, run:**\n",
313
319
"\n",
314
320
"-------------------------\n",
315
321
"```console\n",
@@ -344,6 +350,8 @@
344
350
"```\n",
345
351
"-------------------------\n",
346
352
"\n",
353
+
"You can use `ctrl+d` to exit from the SSH session.\n",
354
+
"\n",
347
355
"> **💡 Hint** - To enable the SSH functionality, SkyPilot adds the remote cluster to your `~/.ssh/config`. This means you can use the cluster name alias with other ssh tools, such as `scp`, `rsync`, VSCode and more!"
348
356
]
349
357
},
@@ -358,17 +366,18 @@
358
366
"cell_type": "markdown",
359
367
"metadata": {},
360
368
"source": [
361
-
"SkyPilot clusters can exist in three states, each of which has different billing and storage implications:\n",
369
+
"SkyPilot clusters can exist in four states, each of which has different billing and storage implications:\n",
362
370
"\n",
363
-
"* **`RUNNING`** - Cluster is up and running, you will be billed for the instance and the attached storages.\n",
371
+
"* **`INIT`** - Cluster is initializing.\n",
372
+
"* **`UP`** - Cluster is up and running, you will be billed for the instance and the attached storages.\n",
364
373
"* **`STOPPED`** - Cluster nodes are shut down and their disks are suspended. Your data and node state is safe and the cluster can be restored to running state when required. You will be billed only for the storage.\n",
365
374
"* **`TERMINATED`** - Cluster is terminated and all nodes and their attached disks are deleted. These clusters cannot be restarted and will not be shown in `sky status`.\n",
366
375
"\n",
367
376
"To manage these states, SkyPilot offers three useful commands:\n",
368
377
"\n",
369
-
"1. **`sky stop`** - stops a `RUNNING` cluster.\n",
378
+
"1. **`sky stop`** - stops a `UP` cluster.\n",
370
379
"2. **`sky start`** - starts a `STOPPED` cluster.\n",
371
-
"2. **`sky down`** - terminates a `RUNNING` or `STOPPED` cluster.\n",
380
+
"2. **`sky down`** - terminates a `UP` or `STOPPED` cluster.\n",
372
381
"\n",
373
382
"> **💡 Hint** - `sky stop` and `sky start` are useful when you want to suspend your experiments for a while but want to quickly resume later. `sky down` is useful to delete a cluster and restart a job from scratch."
374
383
]
@@ -377,22 +386,22 @@
377
386
"cell_type": "markdown",
378
387
"metadata": {},
379
388
"source": [
380
-
"## 💻 Terminate your cluster!\n",
389
+
"## <span style=\"color:green\">[DIY]</span> 💻 Terminate your cluster!\n",
381
390
"Now that we are done using the cluster, let's terminate it to stop being billed for it. You can use `sky down` to terminate a cluster.\n",
382
391
"\n",
383
-
"First, let's get the cluster name with `sky status`.\n",
392
+
"**First, get the cluster name with `sky status`.**\n",
384
393
"\n",
385
394
"-------------------------\n",
386
395
"```console\n",
387
-
"sky status\n",
396
+
"$ sky status\n",
388
397
"```\n",
389
398
"-------------------------\n",
390
399
"\n",
391
-
"and then run `sky down` to terminate the cluster\n",
400
+
"**and then run `sky down` to terminate the cluster**\n",
392
401
"\n",
393
402
"-------------------------\n",
394
403
"```console\n",
395
-
"sky down <cluster-name>\n",
404
+
"$ sky down <cluster-name>\n",
396
405
"```\n",
397
406
"-------------------------"
398
407
]
@@ -435,20 +444,24 @@
435
444
"cell_type": "markdown",
436
445
"metadata": {},
437
446
"source": [
438
-
"## 💻 Launch example.yaml on google cloud with with the `--cloud` flag"
447
+
"## <span style=\"color:green\">[DIY]</span> 💻 Launch example.yaml on google cloud with with the `--cloud` flag"
439
448
]
440
449
},
441
450
{
442
451
"cell_type": "markdown",
443
452
"metadata": {},
444
453
"source": [
445
-
"To override the SkyPilot optimizer and manually pick a cloud, use the `--cloud [aws,gcp,azure]` flag for `sky launch` like so:\n",
454
+
"To override the SkyPilot optimizer and manually pick a cloud, use the `--cloud <cloud>` flag for `sky launch`.\n",
455
+
"\n",
456
+
"**Go ahead and run the task on GCP using `--cloud gcp` flag.**\n",
"We're at the end of this notebook and we don't want to let your cluster keep running and rack up a big bill! Let's terminate the cluster with `sky down`.\n",
499
+
"## <span style=\"color:green\">[DIY]</span> 💻 Terminate your GCP cluster!\n",
500
+
"We're at the end of this notebook and we don't want to let your GCP cluster keep running and rack up a big bill! Let's terminate the cluster with `sky down`.\n",
488
501
"\n",
489
-
"First, let's get the cluster name with `sky status`.\n",
502
+
"**First, get the cluster name with `sky status`.**\n",
490
503
"\n",
491
504
"-------------------------\n",
492
505
"```console\n",
493
506
"sky status\n",
494
507
"```\n",
495
508
"-------------------------\n",
496
509
"\n",
497
-
"and then run `sky down` to terminate the cluster\n",
510
+
"**and then run `sky down` to terminate the cluster**\n",
498
511
"\n",
499
512
"-------------------------\n",
500
513
"```console\n",
@@ -507,7 +520,7 @@
507
520
"cell_type": "markdown",
508
521
"metadata": {},
509
522
"source": [
510
-
"#### 🎉 Congratulations! You have learnt the basics of SkyPilot! Please proceed to the next notebook to learn how to use accelerators and object stores in SkyPilot.\n"
523
+
"#### 🎉 Congratulations! You have used SkyPilot to seamlessly run tasks on two clouds! Please proceed to the next notebook to learn how to use accelerators and object stores in SkyPilot.\n"
Copy file name to clipboardExpand all lines: 02_using_accelerators/02_using_accelerators.ipynb
+35-7
Original file line number
Diff line number
Diff line change
@@ -37,9 +37,11 @@
37
37
"cell_type": "markdown",
38
38
"metadata": {},
39
39
"source": [
40
-
"# Listing supported accelerators with `sky show-gpus`\n",
40
+
"# <span style=\"color:green\">[DIY]</span> Listing supported accelerators with `sky show-gpus`\n",
41
41
"\n",
42
-
"To see the list of accelerators supported by SkyPilot , you can use the `sky show-gpus` command. You can run `sky show-gpus` by running the cell below."
42
+
"To see the list of accelerators supported by SkyPilot , you can use the `sky show-gpus` command. \n",
43
+
"\n",
44
+
"**Run `sky show-gpus` by running the cell below:**"
43
45
]
44
46
},
45
47
{
@@ -118,7 +120,7 @@
118
120
"cell_type": "markdown",
119
121
"metadata": {},
120
122
"source": [
121
-
"## 📝 Edit `bert.yaml` to use a T4 GPU! \n",
123
+
"## <span style=\"color:green\">[DIY]</span> 📝 Edit `bert.yaml` to use a V100 GPU! \n",
122
124
"\n",
123
125
"We have provided an example YAML (`bert.yaml`) which fine-tunes a BERT model on the SQuAD dataset. However, it does not specify any GPU resources for training.\n",
124
126
"\n",
@@ -168,19 +170,21 @@
168
170
"cell_type": "markdown",
169
171
"metadata": {},
170
172
"source": [
171
-
"## 💻 Launch your BERT training task!\n",
173
+
"## <span style=\"color:green\">[DIY]</span> 💻 Launch your BERT training task!\n",
172
174
"\n",
173
-
"**After you have edited `bert.yaml` to use T4 GPUs**, open a terminal and use `sky launch` to create a GPU cluster:\n",
175
+
"**After you have edited `bert.yaml` to use V100 GPUs, open a terminal and use `sky launch` to create a GPU cluster:**\n",
174
176
"\n",
175
177
"-------------------------\n",
176
178
"```console\n",
177
179
"sky launch 02_using_accelerators/bert.yaml\n",
178
180
"```\n",
179
181
"-------------------------\n",
180
182
"\n",
183
+
"This will take about two minutes.\n",
184
+
"\n",
181
185
"### Expected output\n",
182
186
"\n",
183
-
"After the usual SkyPilot output, you should your task run:\n",
187
+
"After the usual SkyPilot output, you should see your task run:\n",
184
188
"\n",
185
189
"-------------------------\n",
186
190
"```console\n",
@@ -205,7 +209,9 @@
205
209
"cell_type": "markdown",
206
210
"metadata": {},
207
211
"source": [
208
-
"## 💻 Remember to terminate your cluster once you're done!\n",
212
+
"## <span style=\"color:green\">[DIY]</span> 💻 Remember to terminate your cluster once you're done!\n",
213
+
"\n",
214
+
"**Run `sky status` to get the cluster name and then use `sky down` to terminate it.**\n",
209
215
"\n",
210
216
"-------------------------\n",
211
217
"```console\n",
@@ -216,6 +222,28 @@
216
222
"-------------------------"
217
223
]
218
224
},
225
+
{
226
+
"cell_type": "markdown",
227
+
"metadata": {},
228
+
"source": [
229
+
"# Transparently training BERT on a different cloud\n",
230
+
"Moving this complex BERT training job to a different cloud is easy with SkyPilot. \n",
231
+
"\n",
232
+
"**Even though this task requires access to accelerators and object stores, SkyPilot can seamlessly run this job on a different cloud with just one line change - adding the `--cloud` flag to `sky launch`.**\n",
233
+
"\n",
234
+
"Just like in the previous notebook, you can simply use the same YAML:\n",
0 commit comments