ex6 complete

Mark Linderman · Mark Linderman · commit 9700647418c6 · 2019-01-21T16:18:09.000-05:00
diff --git a/ML Notes.md b/ML Notes.md
@@ -430,10 +430,12 @@ Large value of C or small value of $\sigma^2$ = lower bias, high variance, overf
 Small value of C or large value of $\sigma^2$ = higher bias, lower variance, underfitting tendency
 
 ### SVMs in practice
-Need to choose C and kernel to use
+Need to choose C, sigma and kernel to use
   - linear kernel is no kernel at all - use if you have a large number of features, small number of training set where you might risk overfitting
   - Gaussian kernel - if you choose this, you need to also choose a $\sigma^2$ - use for complex, non-linear hypothesis.  You'll have to provide a function to compute the kernel.  Wow, these will automatically compute the features from this function.
 
+Choosing the best C and sigma values can be done (as in the homework) by training against a range of values for each - all combinations.  During that iterative training, you keep track of the C, sigma and percentage of missed estimates each combination's model produces against the *validation set*.  After all combinations' results are recorded, find the minimum missed estimate percentage in (presumably) a vector or matrix you used for that purpose and use that value of C and sigma to train the best model.  If this iterative training took a long time, you'd obviously want to save the models that resulted for each so that you wouldn't have to do it again for the final version.  
+
 It's important to do feature scaling because a Gaussian kernel computing differences between x and l (landmarks), could be very different. (Consider house square feet vs. number of bedrooms.)  Not all similarity functions make valid kernels.  A few valid others are: Polynomial kernel: k(x, l) = $(X^Tl)^2$.  That's just one version - but usually you need to provide the degree of polynomial and a constant that's added to that equation.
 
 But all valid kernels are so-called similarity kernels.  String kernel, chi-square, etc.
diff --git a/machine-learning-ex6/ex6/dataset3Params.m b/machine-learning-ex6/ex6/dataset3Params.m
@@ -8,8 +8,14 @@
 %
 
 % You need to return the following variables correctly.
+% C = 1;
+% sigma = 0.3;
+% after running the code below, found that optimal values of C, sigma
+% (that resulted in the lowest % of misses against the validation set)
+% were C = 1; sigma = 0.1.  Commented out starter values above in favor of these
 C = 1;
-sigma = 0.3;
+sigma = 0.1;
+
 
 % ====================== YOUR CODE HERE ======================
 % Instructions: Fill in this function to return the optimal C and sigma
@@ -21,12 +27,34 @@
 %
 %  Note: You can compute the prediction error using 
 %        mean(double(predictions ~= yval))
-%
-
-
-
 
+% that ~= operator is not equals.  So, you're taking the mean 
+% value of all the predictions but first converting them to 0 or 1
+% so, you'll get a percentage of predictions that is incorrect
+% if 10 examples and you correctly predict 8 of them, 
+% the total values of ~= comparisons is 2.  That over 10 is .2 or 20%
+%
 
+%{
+testValues = [.01, .03, 0.1, 0.3, 1, 3, 10, 30];
+predictResults = [0, 3];
+x1 = X(:,1);
+x2 = X(:,2);
+predictCount = 0;
+
+for C = [testValues]
+    for sigma = [testValues]
+        model = svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma));
+        predictions = svmPredict(model, Xval);
+        predictResults(++predictCount, 1:3) = [C, sigma, mean(double(predictions ~= yval))];
+    end;
+end;
+
+predictResults
+[min, idx] = min(predictResults(:,3))
+C = predictResults(idx,1)
+sigma = predictResults(idx,2)
+%}
 
 
 % =========================================================================
diff --git a/machine-learning-ex6/ex6/emailFeatures.m b/machine-learning-ex6/ex6/emailFeatures.m
@@ -49,9 +49,10 @@
 %
 
 
+% so, we have a word_indices vector like [23, 34, 584, 887, 983]  (not necessarily in order)
+% and we want a 1 in each of those positions in x (already initialized to be zero and the length of all possible vocab words)
 
-
-
+x(word_indices) = 1;
 
 
 
diff --git a/machine-learning-ex6/ex6/processEmail.m b/machine-learning-ex6/ex6/processEmail.m
@@ -97,13 +97,13 @@
     %       str2). It will return 1 only if the two strings are equivalent.
     %
 
-
-
-
-
-
-
-
+    % brute force search cuz cell arrays aren't associative arrays
+    % so vocabList can't be searched accessed by anything but the index
+    for idx = 1:length(vocabList)
+        if strcmp(vocabList{idx}, str) == 1
+            word_indices = [word_indices; idx];
+        end
+    end
 
 
     % =============================================================
diff --git a/machine-learning-ex6/ex6/token.mat b/machine-learning-ex6/ex6/token.mat
@@ -1,4 +1,4 @@
-# Created by Octave 4.4.1, Thu Jan 17 21:18:37 2019 EST <marklinderman@Marks-MacBook-Pro.local>
+# Created by Octave 4.4.1, Mon Jan 21 16:17:38 2019 EST <marklinderman@Marks-MacBook-Pro.local>
 # name: email
 # type: sq_string
 # elements: 1

Original file line number	Diff line number	Diff line change
`@@ -49,9 +49,10 @@`
`49`	`49`	`%`
`50`	`50`
`51`	`51`
	`52`	`+% so, we have a word_indices vector like [23, 34, 584, 887, 983] (not necessarily in order)`
	`53`	`+% and we want a 1 in each of those positions in x (already initialized to be zero and the length of all possible vocab words)`
`52`	`54`
`53`		`-`
`54`		`-`
	`55`	`+x(word_indices) = 1;`
`55`	`56`
`56`	`57`
`57`	`58`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Created by Octave 4.4.1, Thu Jan 17 21:18:37 2019 EST <[email protected]>`
	`1`	`+# Created by Octave 4.4.1, Mon Jan 21 16:17:38 2019 EST <[email protected]>`
`2`	`2`	`# name: email`
`3`	`3`	`# type: sq_string`
`4`	`4`	`# elements: 1`