You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
199
+
"To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
163
200
"\n",
164
201
"In this next block, you'll define the fully connected layers of this simple work."
" # '''TODO: Define the second Dense layer to output the classification probabilities'''\n",
184
-
"'''TODO: Dense layer to output classification probabilities'''\n",
185
-
"\n",
221
+
"[TODO Dense layer to output classification probabilities]\n",
222
+
"\n",
186
223
" ])\n",
187
224
" return fc_model\n",
188
225
"\n",
@@ -208,7 +245,7 @@
208
245
"\n",
209
246
"After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are fully-connected neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
210
247
"\n",
211
-
"That defines our fully connected model!"
248
+
"That defines our fully connected model!"
212
249
]
213
250
},
214
251
{
@@ -229,7 +266,7 @@
229
266
"\n",
230
267
"We'll start out by using a stochastic gradient descent (SGD) optimizer initialized with a learning rate of 0.1. Since we are performing a categorical classification task, we'll want to use the [cross entropy loss](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy).\n",
231
268
"\n",
232
-
"You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model."
269
+
"You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model."
233
270
]
234
271
},
235
272
{
@@ -243,7 +280,7 @@
243
280
"'''TODO: Experiment with different optimizers and learning rates. How do these affect\n",
244
281
" the accuracy of the trained model? Which optimizers and/or learning rates yield\n",
"We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training.\n",
296
+
"We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training.\n",
260
297
"\n",
261
298
"In Lab 1, we saw how we can use `GradientTape` to optimize losses and train models with stochastic gradient descent. After defining the model settings in the `compile` step, we can also accomplish training by calling the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) method on an instance of the `Model` class. We will use this to train our fully connected model\n"
262
299
]
@@ -294,7 +331,7 @@
294
331
"source": [
295
332
"### Evaluate accuracy on the test dataset\n",
296
333
"\n",
297
-
"Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.\n",
334
+
"Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.\n",
298
335
"\n",
299
336
"Use the [`evaluate`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate) method to evaluate the model on the test dataset!"
300
337
]
@@ -319,7 +356,7 @@
319
356
"id": "yWfgsmVXCaXG"
320
357
},
321
358
"source": [
322
-
"You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data.\n",
359
+
"You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data.\n",
323
360
"\n",
324
361
"What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...\n",
325
362
"\n",
@@ -369,28 +406,28 @@
369
406
" cnn_model = tf.keras.Sequential([\n",
370
407
"\n",
371
408
" # TODO: Define the first convolutional layer\n",
372
-
" tf.keras.layers.Conv2D('''TODO'''), \n",
409
+
" tf.keras.layers.Conv2D('''TODO''')\n",
373
410
"\n",
374
411
" # TODO: Define the first max pooling layer\n",
375
-
" tf.keras.layers.MaxPool2D('''TODO'''),\n",
412
+
" tf.keras.layers.MaxPool2D('''TODO''')\n",
376
413
"\n",
377
414
" # TODO: Define the second convolutional layer\n",
" # TODO: Define the last Dense layer to output the classification\n",
423
+
" # TODO: Define the last Dense layer to output the classification\n",
387
424
" # probabilities. Pay attention to the activation needed a probability\n",
388
425
" # output\n",
389
-
"'''TODO: Dense layer to output classification probabilities'''\n",
426
+
"[TODO Dense layer to output classification probabilities]\n",
390
427
" ])\n",
391
-
"\n",
428
+
"\n",
392
429
" return cnn_model\n",
393
-
"\n",
430
+
"\n",
394
431
"cnn_model = build_cnn_model()\n",
395
432
"# Initialize the model by passing some data through\n",
396
433
"cnn_model.predict(train_images[[0]])\n",
@@ -443,7 +480,7 @@
443
480
"source": [
444
481
"'''TODO: Use model.fit to train the CNN model, with the same batch_size and number of epochs previously used.'''\n",
445
482
"cnn_model.fit('''TODO''')\n",
446
-
"comet_model_2.end() "
483
+
"# comet_model_2.end() ## uncomment this line to end the comet experiment"
447
484
]
448
485
},
449
486
{
@@ -475,7 +512,9 @@
475
512
"id": "2rvEgK82Glv9"
476
513
},
477
514
"source": [
478
-
"What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model? "
515
+
"What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model?\n",
516
+
"\n",
517
+
"Feel free to click the Comet links to investigate the training/accuracy curves for your model."
479
518
]
480
519
},
481
520
{
@@ -526,7 +565,7 @@
526
565
"id": "-hw1hgeSCaXN"
527
566
},
528
567
"source": [
529
-
"As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
568
+
"As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
530
569
"\n",
531
570
"Let's look at the digit that has the highest confidence for the first image in the test dataset:"
"Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts.\n",
677
+
"Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts.\n",
639
678
"\n",
640
679
"As an alternative to this, we can use the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) class to record differentiation operations during training, and then call the [`tf.GradientTape.gradient`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient) function to actually compute the gradients. You may recall seeing this in Lab 1 Part 1, but let's take another look at this here.\n",
641
680
"\n",
@@ -675,14 +714,16 @@
675
714
"\n",
676
715
" #'''TODO: compute the categorical cross entropy loss\n",
"In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias."
741
+
"In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias."
0 commit comments