Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

byAkshay Daga (APDaga) -June 08, 2018

54

Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

> It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.

> But, In case you stuck in between, feel free to refer to the solutions provided by me.

NOTE:

Don't just copy paste the code for the sake of completion.

Even if you copy the code, make sure you understand the code first.

Click here to check out week-3 assignment solutions, Scroll down for the solutions for week-4 assignment.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

Recommended Machine Learning Courses:

Coursera: Machine Learning

Coursera: Deep Learning Specialization

Coursera: Machine Learning with Python

Coursera: Advanced Machine Learning Specialization

Udemy: Machine Learning

LinkedIn: Machine Learning

Eduonix: Machine Learning

edX: Machine Learning

Fast.ai: Introduction to Machine Learning for Coders

It consist of the following files:

ex3.m - Octave/MATLAB script that steps you through part 1
ex3 nn.m - Octave/MATLAB script that steps you through part 2
ex3data1.mat - Training set of hand-written digits
ex3weights.mat - Initial weights for the neural network exercise
submit.m - Submission script that sends your solutions to our servers
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
[*] lrCostFunction.m - Logistic regression cost function
[*] oneVsAll.m - Train a one-vs-all multi-class classifier
[*] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier
[*] predict.m - Neural network prediction function
Video - YouTube videos featuring Free IOT/ML tutorials

* indicates files you will need to complete

lrCostFunction.m :

function [J, grad] = lrCostFunction(theta, X, y, lambda)
  %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
  %regularization
  %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
  %   theta as the parameter for regularized logistic regression and the
  %   gradient of the cost w.r.t. to the parameters. 
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  grad = zeros(size(theta));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta.
  %               You should set J to the cost.
  %               Compute the partial derivatives and set grad to the partial
  %               derivatives of the cost w.r.t. each parameter in theta
  %
  % Hint: The computation of the cost function and gradients can be
  %       efficiently vectorized. For example, consider the computation
  %
  %           sigmoid(X * theta)
  %
  %       Each row of the resulting matrix will contain the value of the
  %       prediction for that example. You can make use of this to vectorize
  %       the cost function and gradient computations. 
  %
  % Hint: When computing the gradient of the regularized cost function, 
  %       there're many possible vectorized solutions, but one solution
  %       looks like:
  %           grad = (unregularized gradient for logistic regression)
  %           temp = theta; 
  %           temp(1) = 0;   % because we don't add anything for j = 0  
  %           grad = grad + YOUR_CODE_HERE (using the temp variable)
  %
  
  %DIMENSIONS: 
  %   theta = (n+1) x 1
  %   X     = m x (n+1)
  %   y     = m x 1
  %   grad  = (n+1) x 1
  %   J     = Scalar
  
  z   = X * theta;   % m x 1
  h_x = sigmoid(z);  % m x 1 
  
  reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
  
  J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
  
  grad(1) = (1/m) * (X(:,1)'*(h_x-y));                                    % 1 x 1
  grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);  % n x 1
  
  % =============================================================
  
  grad = grad(:);
end

oneVsAll.m :

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
  %ONEVSALL trains multiple logistic regression classifiers and returns all
  %the classifiers in a matrix all_theta, where the i-th row of all_theta 
  %corresponds to the classifier for label i
  %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
  %   logistic regression classifiers and returns each of these classifiers
  %   in a matrix all_theta, where the i-th row of all_theta corresponds 
  %   to the classifier for label i
  
  % num_labels = No. of output classifier (Here, it is 10)
  
  % Some useful variables
  m = size(X, 1);        % No. of Training Samples == No. of Images : (Here, 5000) 
  n = size(X, 2);        % No. of features == No. of pixels in each Image : (Here, 400)
  
  % You need to return the following variables correctly 
  all_theta = zeros(num_labels, n + 1);  
  %DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401
  
  %DIMENSIONS: X = m x input_layer_size
  %Here, 1 row in X represents 1 training Image of pixel 20x20
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];   %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: You should complete the following code to train num_labels
  %               logistic regression classifiers with regularization
  %               parameter lambda. 
  %
  % Hint: theta(:) will return a column vector.
  %
  % Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
  %       whether the ground truth is true/false for this class.
  %
  % Note: For this assignment, we recommend using fmincg to optimize the cost
  %       function. It is okay to use a for-loop (for c = 1:num_labels) to
  %       loop over the different classes.
  %
  %       fmincg works similarly to fminunc, but is more efficient when we
  %       are dealing with large number of parameters.
  %
  % Example Code for fmincg:
  %
  %     % Set Initial theta
  %     initial_theta = zeros(n + 1, 1);
  %     
  %     % Set options for fminunc
  %     options = optimset('GradObj', 'on', 'MaxIter', 50);
  % 
  %     % Run fmincg to obtain the optimal theta
  %     % This function will return theta and the cost 
  %     [theta] = ...
  %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
  %                 initial_theta, options);
  %
  
  initial_theta = zeros(n+1, 1);
  options = optimset('GradObj', 'on', 'MaxIter', 50);
  
  for c=1:num_labels
  all_theta(c,:) = ...
           fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                   initial_theta, options);
  end
  
  % =========================================================================
end

predictOneVsAll.m :

function p = predictOneVsAll(all_theta, X)
  %PREDICT Predict the label for a trained one-vs-all classifier. The labels
  %are in the range 1..K, where K = size(all_theta, 1).
  %  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
  %  for each example in the matrix X. Note that X contains the examples in
  %  rows. all_theta is a matrix where the i-th row is a trained logistic
  %  regression theta vector for the i-th class. You should set p to a vector
  %  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
  %  for 4 examples)
  
  m = size(X, 1);     % No. of Input Examples to Predict (Each row = 1 Example)
  num_labels = size(all_theta, 1); %No. of Ouput Classifier
  
  % You need to return the following variables correctly
  p = zeros(size(X, 1), 1);    % No_of_Input_Examples x 1 == m x 1
  
  % Add ones to the X data matrix
  X = [ones(m, 1) X];
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned logistic regression parameters (one-vs-all).
  %               You should set p to a vector of predictions (from 1 to
  %               num_labels).
  %
  % Hint: This code can be done all vectorized using the max function.
  %       In particular, the max function can also return the index of the
  %       max element, for more information see 'help max'. If your examples
  %       are in rows, then, you can use max(A, [], 2) to obtain the max
  %       for each row.
  %
  % num_labels = No. of output classifier (Here, it is 10)
  % DIMENSIONS:
  % all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)
  
  prob_mat = X * all_theta';     % 5000 x 10 == no_of_input_image x num_labels
  [prob, p] = max(prob_mat,[],2); % m  x 1 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  %%%%%%%% WORKING: Computation per input image %%%%%%%%%
  % for i = 1:m                               % To iterate through each input sample
  %     one_image = X(i,:);                   % 1 x 401 == 1 x no_of_features
  %     prob_mat = one_image * all_theta';    % 1 x 10  == 1 x num_labels
  %     [prob, out] = max(prob_mat);
  %     %out: predicted output
  %     %prob: probability of predicted output
  %     p(i) = out;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  %%%%%%%% WORKING %%%%%%%%%
  % for i = 1:m
  %     RX = repmat(X(i,:),num_labels,1);
  %     RX = RX .* all_theta;
  %     SX = sum(RX,2);
  %     [val, index] = max(SX);
  %     p(i) = index;
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%
  % =========================================================================
end

Check-out our free tutorials on IOT (Internet of Things):

predict.m :

function p = predict(Theta1, Theta2, X)
  %PREDICT Predict the label of an input given a trained neural network
  %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
  %   trained weights of a neural network (Theta1, Theta2)
  
  % Useful values
  m = size(X, 1);
  num_labels = size(Theta2, 1);
  
  % You need to return the following variables correctly 
  p = zeros(size(X, 1), 1);  % m x 1
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned neural network. You should set p to a 
  %               vector containing labels between 1 to num_labels.
  %
  % Hint: The max function might come in useful. In particular, the max
  %       function can also return the index of the max element, for more
  %       information see 'help max'. If your examples are in rows, then, you
  %       can use max(A, [], 2) to obtain the max for each row.
  %
  %DIMENSIONS:
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % layer1 (input)  = 400 nodes + 1bias
  % layer2 (hidden) = 25 nodes + 1bias 
  % layer3 (output) = 10 nodes
  % 
  % theta dimensions = S_(j+1) x ((S_j)+1)
  % theta1 = 25 x 401
  % theta2 = 10 x 26
  
  % theta1:
  %     1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
  %     2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
  %     2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
  %     
  % theta2:
  %     1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
  %     2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
  %     and
  %     1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
  %     2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3
      
  a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X 
  %No. of rows = no. of input images
  %No. of Column = No. of features in each image
  
  z2 = a1 * Theta1';  % 5000 x 25
  a2 = sigmoid(z2);   % 5000 x 25
 
  a2 =  [ones(size(a2,1),1) a2];  % 5000 x 26
  
  z3 = a2 * Theta2';  % 5000 x 10
  a3 = sigmoid(z3);  % 5000 x 10
  
  [prob, p] = max(a3,[],2); 
  %returns maximum element in each row  == max. probability and its index for each input image
  %p: predicted output (index)
  %prob: probability of predicted output
  
  % =========================================================================
end

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------

Click here to see solutions for all Machine Learning Coursera Assignments.

&

Click here to see more codes for Raspberry Pi 3 and similar Family.

&

Click here to see more codes for NodeMCU ESP8266 and similar Family.

&

Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,

-Akshay P. Daga

54 Comments

Bubesh3 December 2018 at 20:02
hey!
In predict.m file theta should be = 25*401 not 26*401;

wrong:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 26 x 401
% theta2 = 10 x 26
correct:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26
ReplyDelete
Replies
harshita28 December 2018 at 03:19
predict.m is not working
ReplyDelete
Replies
Unknown28 January 2019 at 11:51
Hey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m
ReplyDelete
Replies
aspiring DS11 March 2019 at 02:16
Hi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction
ReplyDelete
Replies
as19 June 2019 at 13:28
Sigmoid function is missing in predictOneVsAll
ReplyDelete
Replies
Unknown6 August 2019 at 00:55
will you please tell me what is t here?

@(t)(lrCostFunction(t, X, (y == c), lambda)
ReplyDelete
Replies
Blitz24 March 2020 at 19:16
why do to separate grad into two line? like seen below
grad(1) = (1/m) * (X(:,1)'*(h_x-y));
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

Just writing it as
grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;
works fine or am i missing something here?
ReplyDelete
Replies
Rohan Patil29 March 2020 at 12:01
Thankyou for your help it's really great of you , i just wanted to know 2 things

(1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

(2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'
ReplyDelete
Replies
Akshay27 April 2020 at 14:08
Hi Akshay

Thanks for creating this amazing forum for us like minded people. Had a couple of queries:

1. Am not able to understand the variables of fmincg function (despite of using 'help'. It would be great if someone could help me with the same !

2. What do the three dots (...) in the line preceding the fmincg function specify ? Why are they needed ? (tried running the function without them but it pointed out as syntax error !

Thanks in advance.
ReplyDelete
Replies
Qwert12324 May 2020 at 20:15
None of the coed are working, getting 0/100
ReplyDelete
Replies
Unknown29 May 2020 at 19:34
how were you able to solve onevsall.m predictOneVsAll.m and predict.m bc i am trying to understand the problem and i am not getting how should i solve it
ReplyDelete
Replies
Aravindh7 June 2020 at 10:10
Can anyone explain what "theta_t" is? Why and how they coose some random value "[-2; -1; 1; 2]" (in ex.m).
ReplyDelete
Replies
Unknown28 June 2020 at 21:46
Hi Akshay ,
It is showing error as unprecedented parameter name 'GrabObj'
ReplyDelete
Replies
Kailas5 July 2020 at 19:32
Hi Akshay,
In OneVsall.m,it is saying IrCostFunction is undefined.
Why is it so?
ReplyDelete
Replies
Vedant Patil21 July 2020 at 19:05
Hello,

Can you help me resolve this
octave:7> oneVsAll.m
error: 'X' undefined near line 11 column 10
error: called from
oneVsAll at line 11 column 3
ReplyDelete
Replies
Prateek Srivastava9 August 2020 at 15:05
Hi..... I used same to same implementation but the cost of my set is coming out to be 45.73 in contrast to the expected cost of 2.53.
I am using the same logic as yours but I dont know why is this happening.
Can you plz help me out?
ReplyDelete
Replies
pratik10 August 2020 at 09:42
Hi Akshay,
I have used the same code as yours in predict.m
Within the exercise code i am getting training exercise accuracy as expected (97.5%). Also the digit is also being recognized correctly.

But when i am submitting the code for grading, i am getting the following error:

!! Submission failed: unexpected error: Index exceeds the number of array elements (16).
!! Please try again later.

Thanks in advance for the help.
ReplyDelete
Replies
jimmy12 August 2020 at 03:40
Could you please explain the line all_theta(c,:) = ... in onevsall. I got stuck for this an hour
ReplyDelete
Replies
Unknown9 September 2020 at 16:22
I dont know , i am getting iteration and cost on output console here i am posting some of them. Please help as i am stuck there for more than one day.

Iteration 16 | Cost: 1.018509e-01
Iteration 17 | Cost: 1.018509e-01
Iteration 18 | Cost: 1.018509e-01
Iteration 19 | Cost: 1.018509e-01
Iteration 20 | Cost: 1.018509e-01
Iteration 21 | Cost: 1.018509e-01
Iteration 22 | Cost: 1.018509e-01
Iteration 23 | Cost: 1.018509e-01
Iteration 24 | Cost: 1.018509e-01
Iteration 25 | Cost: 1.018509e-01
Iteration 26 | Cost: 1.018509e-01

all_theta =

-0.5595 0.6192 -0.5504 -0.0935
-5.4744 -0.4716 1.2613 0.6349
0.0684 -0.3756 -1.6523 -1.4101
ReplyDelete
Replies
Theresa19 September 2020 at 22:47
Hi could you please help me? this is my code on lrcostfunction:

H = sigmoid(X*theta);
T = y.*log(H) + (1 - y).*log(1 - H);
J = -1/m*sum(T) + lambda/(2*m)*sum(theta(2:end).^2);

ta = [0; theta(2:end)];
grad = X'*(H - y)/m + lambda/m*ta;

but im getting this error:

>> lrCostFunction
Not enough input arguments.

Error in lrCostFunction (line 9)
m = length(y); % number of training examples

I try using your code to check if i was wrong but i got the same error could you help me? please
ReplyDelete
Replies
Unknown31 October 2020 at 11:59
Hey, I have question and that is when we were calculating grad in week 3 assignment we include
grad(1) = (1/m)* sum(X(:,1)'*(hx-y));
grad(2:end) = (1/m)* sum(X(:,2:end)'*(hx-y))+(lambda/m)*theta(2:end);
Now, when we calculate in week 4 we remove "sum" in both equations, my question is why we remove sum and when I calculate with sum it's provides wrong answer.
ReplyDelete
Replies
Cron20 February 2021 at 06:34
Hi
for the oneVsAll.m problem, how would the code look like if you don't use the fmincg function,
I'm kinda lost on the process of how to get all_theta
ReplyDelete
Replies
rths27 April 2021 at 21:51
can you send submit.m and submit confg file of the of this experiment
ReplyDelete
Replies
E.chandu7 July 2021 at 07:14
i am getting error at predict.m file
error: called from
predict at line 7 column 5
ReplyDelete
Replies
JennyPham10 July 2021 at 09:40
I am trying to submit the whole package. All scripts so far are running and give me the correct answer, but when I submit to the test servers, I get an error on the size of a matrix

!! Submission failed: unexpected error: Matrix dimensions must agree.
!! Please try again later.
How can I fix this error? Thanks
ReplyDelete
Replies
Unknown31 July 2021 at 19:17
In predict.m why do we have to do a1 * Theta1' instead of Theta1 * a1'?
ReplyDelete
Replies
sk31 January 2022 at 15:56
i am getting a very high cost function that is around 45.734819. plese tell me why i am getting this.
ReplyDelete
Replies
Misia29 October 2022 at 18:20
Hello Akshay,
I have a question in relation to the prediction part.
I understand the creation of all_theta, using the fmincg function to create theta parameters that fit the particular number from 1-10, but my question is, that once you multiply X * all_theta', you receive the 5000 x 10 matrix, which is the 5000 samples x (10) the value at each number prediction. How do we know, that the maximum value will be reflecting the number which is most likely thanks to our prediction. So why is it not the minimum value or etc.

Why do we know that the column with the maximum value, will equal the number we predict.

ReplyDelete
Replies

Add comment