|
20 | 20 | "$$h_\\theta(x) = \\theta_0 + \\theta_1x$$\n",
|
21 | 21 | "Will give us an equation of line that will predict the price. The above equation is nothing but the equation of line. __When we say the machine learns, we are actually adjusting the parameters $\\theta_0$ and $\\theta_1$__. So for a new x (size of house) we will insert the value of x in the above equation and produce a value $\\hat y$ (our prediction)\n",
|
22 | 22 | "\n",
|
23 |
| - "Our prediction $\\hat y$ will not always be accurate, and will have a certain error which we will define by an equation. We will also need this equation to minimise the error, this equation is called as __loss function__. One of the most used one is called __Mean Squared Error (MSE) which is nothing but the means of all the errors squared. \n", |
| 23 | + "Our prediction $\\hat y$ will not always be accurate, and will have a certain error which we will define by an equation. We will also need this equation to minimise the error, this equation is called as __loss function__. One of the most used one is called __Mean Squared Error (MSE)__ which is nothing but the means of all the errors squared. \n", |
24 | 24 | "\n",
|
25 | 25 | "### $$J(\\theta) = \\frac{1}{2m}\\sum_{i=0}^m{(h_\\theta(x_i) - y_i)^2}$$\n",
|
26 | 26 | "\n",
|
|
43 | 43 | "Our objective function is\n",
|
44 | 44 | "### $$\\displaystyle \\operatorname*{argmin}_\\theta J(\\theta)$$\n",
|
45 | 45 | "\n",
|
46 |
| - "Which simply means, find the value of $\\theta$ that minimises the error function $J(\\theta)$. In order to do that, we will differentiate our cost function. When we differentiate it, it will give us gradient, which is the direction in which the error will be reduced. Upon having the gradient, we will simply update our {\\theta} values to reflect that step (a step in the direction of lower error) \n", |
| 46 | + "Which simply means, find the value of $\\theta$ that minimises the error function $J(\\theta)$. In order to do that, we will differentiate our cost function. When we differentiate it, it will give us gradient, which is the direction in which the error will be reduced. Upon having the gradient, we will simply update our $\\theta$ values to reflect that step (a step in the direction of lower error) \n", |
47 | 47 | "\n",
|
48 | 48 | "So, the update rule is the following equation\n",
|
49 | 49 | "### $$\\theta = \\theta - \\alpha \\frac{\\partial}{\\partial \\theta} J(\\theta)$$\n",
|
|
52 | 52 | "\n",
|
53 | 53 | " $\\alpha$ = learning rate. Which is the rate at which we will travel to the direction of the lower error.\n",
|
54 | 54 | " \n",
|
55 |
| - "This process is nothing but _Gradient Descent__. There are few version of gradient descent, few of them are:\n", |
| 55 | + "This process is nothing but __Gradient Descent__. There are few version of gradient descent, few of them are:\n", |
56 | 56 | "1. __Batch Gradient Descent__: Go through __all__ your input samples, compute the gradient once, and then update $\\theta$s.\n",
|
57 | 57 | "2. __Stochastic Gradient Descent__: Go through a __single__ sample, compute gradient, update $\\theta$s, repeat $m$ times\n",
|
58 |
| - "3. __Mini Batch Gradient Descent__: Go through a __batch__ of $k$ samples, compute gradient, update $\\theta$s, repear $\\frac{m}{k}$ times. \n", |
| 58 | + "3. __Mini Batch Gradient Descent__: Go through a __batch__ of $k$ samples, compute gradient, update $\\theta$s, repeat $\\frac{m}{k}$ times. \n", |
59 | 59 | "\n",
|
60 | 60 | "\n",
|
61 | 61 | "### Differentiating the loss function:\n",
|
62 | 62 | "In the update rule:\n",
|
63 | 63 | "### $$\\theta = \\theta - \\alpha \\frac{\\partial}{\\partial \\theta} J(\\theta)$$\n",
|
64 | 64 | "\n",
|
65 |
| - "The important part is calculating the derivative. There will be two derivatives, one for $\\theta_0$ and another for $\\theta_1$\n", |
| 65 | + "The important part is calculating the derivative. Since we have two variables, we will have two derivatives, one for $\\theta_0$ and another for $\\theta_1$. \n", |
| 66 | + "\n", |
66 | 67 | "So the first equation is: \n",
|
67 | 68 | "\n",
|
68 | 69 | "$\n",
|
|
138 | 139 | "metadata": {
|
139 | 140 | "anaconda-cloud": {},
|
140 | 141 | "kernelspec": {
|
141 |
| - "display_name": "Python [default]", |
| 142 | + "display_name": "Python 3", |
142 | 143 | "language": "python",
|
143 |
| - "name": "python2" |
| 144 | + "name": "python3" |
144 | 145 | },
|
145 | 146 | "language_info": {
|
146 | 147 | "codemirror_mode": {
|
147 | 148 | "name": "ipython",
|
148 |
| - "version": 2 |
| 149 | + "version": 3 |
149 | 150 | },
|
150 | 151 | "file_extension": ".py",
|
151 | 152 | "mimetype": "text/x-python",
|
152 | 153 | "name": "python",
|
153 | 154 | "nbconvert_exporter": "python",
|
154 |
| - "pygments_lexer": "ipython2", |
155 |
| - "version": "2.7.12" |
| 155 | + "pygments_lexer": "ipython3", |
| 156 | + "version": "3.6.5" |
156 | 157 | }
|
157 | 158 | },
|
158 | 159 | "nbformat": 4,
|
|
0 commit comments