Skip to content

Commit 9700647

Browse files
Mark LindermanMark Linderman
Mark Linderman
authored and
Mark Linderman
committed
ex6 complete
1 parent 8674411 commit 9700647

File tree

5 files changed

+47
-16
lines changed

5 files changed

+47
-16
lines changed

ML Notes.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -430,10 +430,12 @@ Large value of C or small value of $\sigma^2$ = lower bias, high variance, overf
430430
Small value of C or large value of $\sigma^2$ = higher bias, lower variance, underfitting tendency
431431

432432
### SVMs in practice
433-
Need to choose C and kernel to use
433+
Need to choose C, sigma and kernel to use
434434
- linear kernel is no kernel at all - use if you have a large number of features, small number of training set where you might risk overfitting
435435
- Gaussian kernel - if you choose this, you need to also choose a $\sigma^2$ - use for complex, non-linear hypothesis. You'll have to provide a function to compute the kernel. Wow, these will automatically compute the features from this function.
436436

437+
Choosing the best C and sigma values can be done (as in the homework) by training against a range of values for each - all combinations. During that iterative training, you keep track of the C, sigma and percentage of missed estimates each combination's model produces against the *validation set*. After all combinations' results are recorded, find the minimum missed estimate percentage in (presumably) a vector or matrix you used for that purpose and use that value of C and sigma to train the best model. If this iterative training took a long time, you'd obviously want to save the models that resulted for each so that you wouldn't have to do it again for the final version.
438+
437439
It's important to do feature scaling because a Gaussian kernel computing differences between x and l (landmarks), could be very different. (Consider house square feet vs. number of bedrooms.) Not all similarity functions make valid kernels. A few valid others are: Polynomial kernel: k(x, l) = $(X^Tl)^2$. That's just one version - but usually you need to provide the degree of polynomial and a constant that's added to that equation.
438440

439441
But all valid kernels are so-called similarity kernels. String kernel, chi-square, etc.

machine-learning-ex6/ex6/dataset3Params.m

+33-5
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,14 @@
88
%
99

1010
% You need to return the following variables correctly.
11+
% C = 1;
12+
% sigma = 0.3;
13+
% after running the code below, found that optimal values of C, sigma
14+
% (that resulted in the lowest % of misses against the validation set)
15+
% were C = 1; sigma = 0.1. Commented out starter values above in favor of these
1116
C = 1;
12-
sigma = 0.3;
17+
sigma = 0.1;
18+
1319

1420
% ====================== YOUR CODE HERE ======================
1521
% Instructions: Fill in this function to return the optimal C and sigma
@@ -21,12 +27,34 @@
2127
%
2228
% Note: You can compute the prediction error using
2329
% mean(double(predictions ~= yval))
24-
%
25-
26-
27-
2830

31+
% that ~= operator is not equals. So, you're taking the mean
32+
% value of all the predictions but first converting them to 0 or 1
33+
% so, you'll get a percentage of predictions that is incorrect
34+
% if 10 examples and you correctly predict 8 of them,
35+
% the total values of ~= comparisons is 2. That over 10 is .2 or 20%
36+
%
2937

38+
%{
39+
testValues = [.01, .03, 0.1, 0.3, 1, 3, 10, 30];
40+
predictResults = [0, 3];
41+
x1 = X(:,1);
42+
x2 = X(:,2);
43+
predictCount = 0;
44+
45+
for C = [testValues]
46+
for sigma = [testValues]
47+
model = svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma));
48+
predictions = svmPredict(model, Xval);
49+
predictResults(++predictCount, 1:3) = [C, sigma, mean(double(predictions ~= yval))];
50+
end;
51+
end;
52+
53+
predictResults
54+
[min, idx] = min(predictResults(:,3))
55+
C = predictResults(idx,1)
56+
sigma = predictResults(idx,2)
57+
%}
3058

3159

3260
% =========================================================================

machine-learning-ex6/ex6/emailFeatures.m

+3-2
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,10 @@
4949
%
5050

5151

52+
% so, we have a word_indices vector like [23, 34, 584, 887, 983] (not necessarily in order)
53+
% and we want a 1 in each of those positions in x (already initialized to be zero and the length of all possible vocab words)
5254

53-
54-
55+
x(word_indices) = 1;
5556

5657

5758

machine-learning-ex6/ex6/processEmail.m

+7-7
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,13 @@
9797
% str2). It will return 1 only if the two strings are equivalent.
9898
%
9999

100-
101-
102-
103-
104-
105-
106-
100+
% brute force search cuz cell arrays aren't associative arrays
101+
% so vocabList can't be searched accessed by anything but the index
102+
for idx = 1:length(vocabList)
103+
if strcmp(vocabList{idx}, str) == 1
104+
word_indices = [word_indices; idx];
105+
end
106+
end
107107

108108

109109
% =============================================================

machine-learning-ex6/ex6/token.mat

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Created by Octave 4.4.1, Thu Jan 17 21:18:37 2019 EST <[email protected]>
1+
# Created by Octave 4.4.1, Mon Jan 21 16:17:38 2019 EST <[email protected]>
22
# name: email
33
# type: sq_string
44
# elements: 1

0 commit comments

Comments
 (0)