MITDeepLearning
diff --git a/‎lab2/Part1_MNIST.ipynb
Lines changed: 95 additions & 42 deletions b/‎lab2/Part1_MNIST.ipynb
Lines changed: 95 additions & 42 deletions
@@ -10,9 +10,9 @@
         "  <td align=\"center\"><a target=\"_blank\" href=\"http://introtodeeplearning.com\">\n",
         "        <img src=\"https://i.ibb.co/Jr88sn2/mit.png\" style=\"padding-bottom:5px;\" />\n",
         "      Visit MIT Deep Learning</a></td>\n",
-        "  <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/2023/lab2/Part1_MNIST.ipynb\">\n",
+        "  <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/master/lab2/Part1_MNIST.ipynb\">\n",
         "        <img src=\"https://i.ibb.co/2P3SLwK/colab.png\"  style=\"padding-bottom:5px;\" />Run in Google Colab</a></td>\n",
-        "  <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/2023/lab2/Part1_MNIST.ipynb\">\n",
+        "  <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/master/lab2/Part1_MNIST.ipynb\">\n",
         "        <img src=\"https://i.ibb.co/xfJbPmL/github.png\"  height=\"70px\" style=\"padding-bottom:5px;\"  />View Source on GitHub</a></td>\n",
         "</table>\n",
         "\n",
@@ -27,8 +27,8 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2023 MIT Introduction to Deep Learning. All Rights Reserved.\n",
-        "# \n",
+        "# Copyright 2024 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
         "# to Deep Learning must reference:\n",
@@ -62,31 +62,67 @@
       "outputs": [],
       "source": [
         "# Import Tensorflow 2.0\n",
-        "%tensorflow_version 2.x\n",
-        "import tensorflow as tf \n",
+        "import tensorflow as tf\n",
         "\n",
-        "!pip install mitdeeplearning\n",
+        "# MIT introduction to deep learning package\n",
+        "!pip install mitdeeplearning --quiet\n",
         "import mitdeeplearning as mdl\n",
         "\n",
-        "#Import Comet\n",
-        "!pip install comet_ml\n",
-        "import comet_ml\n",
-        "comet_ml.init(project_name=\"6.s191lab2_part1_NN\")\n",
-        "comet_model_1 = comet_ml.Experiment()\n",
-        "\n",
+        "# other packages\n",
         "import matplotlib.pyplot as plt\n",
         "import numpy as np\n",
         "import random\n",
         "from tqdm import tqdm"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nCpHDxX1bzyZ"
+      },
+      "source": [
+        "We'll also install Comet. If you followed the instructions from Lab 1, you should have your Comet account set up. Enter your API key below."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GSR_PAqjbzyZ"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install comet_ml > /dev/null 2>&1\n",
+        "import comet_ml\n",
+        "# TODO: ENTER YOUR API KEY HERE!!\n",
+        "COMET_API_KEY = \"\"\n",
+        "\n",
+        "# Check that we are using a GPU, if not switch runtimes\n",
+        "#   using Runtime > Change Runtime Type > GPU\n",
+        "assert len(tf.config.list_physical_devices('GPU')) > 0\n",
+        "assert COMET_API_KEY != \"\", \"Please insert your Comet API Key\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# start a first comet experiment for the first part of the lab\n",
+        "comet_ml.init(project_name=\"6S191lab2_part1_NN\")\n",
+        "comet_model_1 = comet_ml.Experiment()"
+      ],
+      "metadata": {
+        "id": "wGPDtVxvTtPk"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
     {
       "cell_type": "markdown",
       "metadata": {
         "id": "HKjrdUtX_N8J"
       },
       "source": [
-        "## 1.1 MNIST dataset \n",
+        "## 1.1 MNIST dataset\n",
         "\n",
         "Let's download and load the dataset and display a few random samples from it:"
       ]
@@ -113,7 +149,7 @@
         "id": "5ZtUqOqePsRD"
       },
       "source": [
-        "Our training set is made up of 28x28 grayscale images of handwritten digits. \n",
+        "Our training set is made up of 28x28 grayscale images of handwritten digits.\n",
         "\n",
         "Let's visualize what some of these images and their corresponding training labels look like."
       ]
@@ -136,7 +172,8 @@
         "    plt.grid(False)\n",
         "    image_ind = random_inds[i]\n",
         "    plt.imshow(np.squeeze(train_images[image_ind]), cmap=plt.cm.binary)\n",
-        "    plt.xlabel(train_labels[image_ind])"
+        "    plt.xlabel(train_labels[image_ind])\n",
+        "comet_model_1.log_figure(figure=plt)"
       ]
     },
     {
@@ -159,7 +196,7 @@
       },
       "source": [
         "### Fully connected neural network architecture\n",
-        "To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model. \n",
+        "To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
         "\n",
         "In this next block, you'll define the fully connected layers of this simple work."
       ]
@@ -181,8 +218,8 @@
         "      tf.keras.layers.Dense(128, activation= '''TODO'''),\n",
         "\n",
         "      # '''TODO: Define the second Dense layer to output the classification probabilities'''\n",
-        "      '''TODO: Dense layer to output classification probabilities'''\n",
-        "      \n",
+        "      [TODO Dense layer to output classification probabilities]\n",
+        "\n",
         "  ])\n",
         "  return fc_model\n",
         "\n",
@@ -208,7 +245,7 @@
         "\n",
         "After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are fully-connected neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
         "\n",
-        "That defines our fully connected model! "
+        "That defines our fully connected model!"
       ]
     },
     {
@@ -229,7 +266,7 @@
         "\n",
         "We'll start out by using a stochastic gradient descent (SGD) optimizer initialized with a learning rate of 0.1. Since we are performing a categorical classification task, we'll want to use the [cross entropy loss](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy).\n",
         "\n",
-        "You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model. "
+        "You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model."
       ]
     },
     {
@@ -243,7 +280,7 @@
         "'''TODO: Experiment with different optimizers and learning rates. How do these affect\n",
         "    the accuracy of the trained model? Which optimizers and/or learning rates yield\n",
         "    the best performance?'''\n",
-        "model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-1), \n",
+        "model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-1),\n",
         "              loss='sparse_categorical_crossentropy',\n",
         "              metrics=['accuracy'])"
       ]
@@ -256,7 +293,7 @@
       "source": [
         "### Train the model\n",
         "\n",
-        "We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training. \n",
+        "We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training.\n",
         "\n",
         "In Lab 1, we saw how we can use `GradientTape` to optimize losses and train models with stochastic gradient descent. After defining the model settings in the `compile` step, we can also accomplish training by calling the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) method on an instance of the `Model` class. We will use this to train our fully connected model\n"
       ]
@@ -294,7 +331,7 @@
       "source": [
         "### Evaluate accuracy on the test dataset\n",
         "\n",
-        "Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array. \n",
+        "Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.\n",
         "\n",
         "Use the [`evaluate`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate) method to evaluate the model on the test dataset!"
       ]
@@ -319,7 +356,7 @@
         "id": "yWfgsmVXCaXG"
       },
       "source": [
-        "You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data. \n",
+        "You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data.\n",
         "\n",
         "What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...\n",
         "\n",
@@ -369,28 +406,28 @@
         "    cnn_model = tf.keras.Sequential([\n",
         "\n",
         "        # TODO: Define the first convolutional layer\n",
-        "        tf.keras.layers.Conv2D('''TODO'''), \n",
+        "        tf.keras.layers.Conv2D('''TODO''')\n",
         "\n",
         "        # TODO: Define the first max pooling layer\n",
-        "        tf.keras.layers.MaxPool2D('''TODO'''),\n",
+        "        tf.keras.layers.MaxPool2D('''TODO''')\n",
         "\n",
         "        # TODO: Define the second convolutional layer\n",
-        "        tf.keras.layers.Conv2D('''TODO'''),\n",
+        "        tf.keras.layers.Conv2D('''TODO''')\n",
         "\n",
         "        # TODO: Define the second max pooling layer\n",
-        "        tf.keras.layers.MaxPool2D('''TODO'''),\n",
+        "        tf.keras.layers.MaxPool2D('''TODO''')\n",
         "\n",
         "        tf.keras.layers.Flatten(),\n",
         "        tf.keras.layers.Dense(128, activation=tf.nn.relu),\n",
         "\n",
-        "        # TODO: Define the last Dense layer to output the classification \n",
+        "        # TODO: Define the last Dense layer to output the classification\n",
         "        # probabilities. Pay attention to the activation needed a probability\n",
         "        # output\n",
-        "        '''TODO: Dense layer to output classification probabilities'''\n",
+        "        [TODO Dense layer to output classification probabilities]\n",
         "    ])\n",
-        "    \n",
+        "\n",
         "    return cnn_model\n",
-        "  \n",
+        "\n",
         "cnn_model = build_cnn_model()\n",
         "# Initialize the model by passing some data through\n",
         "cnn_model.predict(train_images[[0]])\n",
@@ -443,7 +480,7 @@
       "source": [
         "'''TODO: Use model.fit to train the CNN model, with the same batch_size and number of epochs previously used.'''\n",
         "cnn_model.fit('''TODO''')\n",
-        "comet_model_2.end() "
+        "# comet_model_2.end() ## uncomment this line to end the comet experiment"
       ]
     },
     {
@@ -475,7 +512,9 @@
         "id": "2rvEgK82Glv9"
       },
       "source": [
-        "What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model? "
+        "What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model?\n",
+        "\n",
+        "Feel free to click the Comet links to investigate the training/accuracy curves for your model."
       ]
     },
     {
@@ -526,7 +565,7 @@
         "id": "-hw1hgeSCaXN"
       },
       "source": [
-        "As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits. \n",
+        "As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
         "\n",
         "Let's look at the digit that has the highest confidence for the first image in the test dataset:"
       ]
@@ -624,7 +663,7 @@
         "  plt.subplot(num_rows, 2*num_cols, 2*i+2)\n",
         "  mdl.lab2.plot_value_prediction(i, predictions, test_labels)\n",
         "comet_model_2.log_figure(figure=plt)\n",
-        "comet_model_2.end()"
+        "comet_model_2.end()\n"
       ]
     },
     {
@@ -635,7 +674,7 @@
       "source": [
         "## 1.4 Training the model 2.0\n",
         "\n",
-        "Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts. \n",
+        "Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts.\n",
         "\n",
         "As an alternative to this, we can use the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) class to record differentiation operations during training, and then call the [`tf.GradientTape.gradient`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient) function to actually compute the gradients. You may recall seeing this in Lab 1 Part 1, but let's take another look at this here.\n",
         "\n",
@@ -675,14 +714,16 @@
         "\n",
         "    #'''TODO: compute the categorical cross entropy loss\n",
         "    loss_value = tf.keras.backend.sparse_categorical_crossentropy('''TODO''', '''TODO''') # TODO\n",
+        "\n",
+        "    # log the loss to comet\n",
         "    comet_model_3.log_metric(\"loss\", loss_value.numpy().mean(), step=idx)\n",
         "\n",
         "  loss_history.append(loss_value.numpy().mean()) # append the loss to the loss_history record\n",
         "  plotter.plot(loss_history.get())\n",
         "\n",
         "  # Backpropagation\n",
         "  '''TODO: Use the tape to compute the gradient against all parameters in the CNN model.\n",
-        "      Use cnn_model.trainable_variables to access these parameters.''' \n",
+        "      Use cnn_model.trainable_variables to access these parameters.'''\n",
         "  grads = # TODO\n",
         "  optimizer.apply_gradients(zip(grads, cnn_model.trainable_variables))\n",
         "\n",
@@ -697,7 +738,7 @@
       },
       "source": [
         "## 1.5 Conclusion\n",
-        "In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias. "
+        "In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias."
       ]
     }
   ],
@@ -707,14 +748,26 @@
       "collapsed_sections": [
         "Xmf_JRJa_N8C"
       ],
-      "name": "Part1_MNIST.ipynb",
+      "name": "Part1_MNIST_Solution.ipynb",
       "provenance": []
     },
     "kernelspec": {
       "display_name": "Python 3",
       "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.6"
     }
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}