# Coursera: Machine Learning (Week 4) [Assignment Solution] - Andrew NG

▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

I tried to provide optimized solutions like

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

**These solutions are for reference only.****> It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.****>**

**But, In case you stuck in between, feel free to refer to the solutions provided by me.**

**NOTE:**

Don't just copy paste the code for the sake of completion.

Even if you copy the code, make sure you understand the code first.

**Click here to check out**__week-3__assignment solutions,__Scroll down__for the solutions for__week-4__assignment.In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

**It consist of the following files:****ex3.m -**Octave/MATLAB script that steps you through part 1**ex3 nn.m -**Octave/MATLAB script that steps you through part 2**ex3data1.mat -**Training set of hand-written digits**ex3weights.mat -**Initial weights for the neural network exercise**submit.m -**Submission script that sends your solutions to our servers**displayData.m -**Function to help visualize the dataset**fmincg.m -**Function minimization routine (similar to fminunc)**sigmoid.m -**Sigmoid function**[*] lrCostFunction.m -**Logistic regression cost function**[*] oneVsAll.m -**Train a one-vs-all multi-class classifier**[*] predictOneVsAll.m -**Predict using a one-vs-all multi-class classifier**[*] predict.m -**Neural network prediction function**Video -**YouTube videos featuring Free IOT/ML tutorials

*****indicates files you will need to complete**lrCostFunction.m :**

```
function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
% efficiently vectorized. For example, consider the computation
%
% sigmoid(X * theta)
%
% Each row of the resulting matrix will contain the value of the
% prediction for that example. You can make use of this to vectorize
% the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
% there're many possible vectorized solutions, but one solution
% looks like:
% grad = (unregularized gradient for logistic regression)
% temp = theta;
% temp(1) = 0; % because we don't add anything for j = 0
% grad = grad + YOUR_CODE_HERE (using the temp variable)
%
%DIMENSIONS:
% theta = (n+1) x 1
% X = m x (n+1)
% y = m x 1
% grad = (n+1) x 1
% J = Scalar
z = X * theta; % m x 1
h_x = sigmoid(z); % m x 1
reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
grad(1) = (1/m) * (X(:,1)'*(h_x-y)); % 1 x 1
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end); % n x 1
% =============================================================
grad = grad(:);
end
```

**oneVsAll.m :**

```
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
% logistic regression classifiers and returns each of these classifiers
% in a matrix all_theta, where the i-th row of all_theta corresponds
% to the classifier for label i
% num_labels = No. of output classifier (Here, it is 10)
% Some useful variables
m = size(X, 1); % No. of Training Samples == No. of Images : (Here, 5000)
n = size(X, 2); % No. of features == No. of pixels in each Image : (Here, 400)
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
%DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401
%DIMENSIONS: X = m x input_layer_size
%Here, 1 row in X represents 1 training Image of pixel 20x20
% Add ones to the X data matrix
X = [ones(m, 1) X]; %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
% logistic regression classifiers with regularization
% parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
% whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
% function. It is okay to use a for-loop (for c = 1:num_labels) to
% loop over the different classes.
%
% fmincg works similarly to fminunc, but is more efficient when we
% are dealing with large number of parameters.
%
% Example Code for fmincg:
%
% % Set Initial theta
% initial_theta = zeros(n + 1, 1);
%
% % Set options for fminunc
% options = optimset('GradObj', 'on', 'MaxIter', 50);
%
% % Run fmincg to obtain the optimal theta
% % This function will return theta and the cost
% [theta] = ...
% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
% initial_theta, options);
%
initial_theta = zeros(n+1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c=1:num_labels
all_theta(c,:) = ...
fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
end
% =========================================================================
end
```

**predictOneVsAll.m :**

```
function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).
% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
% for each example in the matrix X. Note that X contains the examples in
% rows. all_theta is a matrix where the i-th row is a trained logistic
% regression theta vector for the i-th class. You should set p to a vector
% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
% for 4 examples)
m = size(X, 1); % No. of Input Examples to Predict (Each row = 1 Example)
num_labels = size(all_theta, 1); %No. of Ouput Classifier
% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % No_of_Input_Examples x 1 == m x 1
% Add ones to the X data matrix
X = [ones(m, 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters (one-vs-all).
% You should set p to a vector of predictions (from 1 to
% num_labels).
%
% Hint: This code can be done all vectorized using the max function.
% In particular, the max function can also return the index of the
% max element, for more information see 'help max'. If your examples
% are in rows, then, you can use max(A, [], 2) to obtain the max
% for each row.
%
% num_labels = No. of output classifier (Here, it is 10)
% DIMENSIONS:
% all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)
prob_mat = X * all_theta'; % 5000 x 10 == no_of_input_image x num_labels
[prob, p] = max(prob_mat,[],2); % m x 1
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output
%%%%%%%% WORKING: Computation per input image %%%%%%%%%
% for i = 1:m % To iterate through each input sample
% one_image = X(i,:); % 1 x 401 == 1 x no_of_features
% prob_mat = one_image * all_theta'; % 1 x 10 == 1 x num_labels
% [prob, out] = max(prob_mat);
% %out: predicted output
% %prob: probability of predicted output
% p(i) = out;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% WORKING %%%%%%%%%
% for i = 1:m
% RX = repmat(X(i,:),num_labels,1);
% RX = RX .* all_theta;
% SX = sum(RX,2);
% [val, index] = max(SX);
% p(i) = index;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%
% =========================================================================
end
```

**Check-out our free tutorials on IOT (Internet of Things):**

**predict.m :**

```
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % m x 1
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
%DIMENSIONS:
% theta1 = 25 x 401
% theta2 = 10 x 26
% layer1 (input) = 400 nodes + 1bias
% layer2 (hidden) = 25 nodes + 1bias
% layer3 (output) = 10 nodes
%
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26
% theta1:
% 1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
% 2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
% and
% 1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
% 2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
%
% theta2:
% 1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
% 2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
% and
% 1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
% 2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3
a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X
%No. of rows = no. of input images
%No. of Column = No. of features in each image
z2 = a1 * Theta1'; % 5000 x 25
a2 = sigmoid(z2); % 5000 x 25
a2 = [ones(size(a2,1),1) a2]; % 5000 x 26
z3 = a2 * Theta2'; % 5000 x 10
a3 = sigmoid(z3); % 5000 x 10
[prob, p] = max(a3,[],2);
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output
% =========================================================================
end
```

I tried to provide optimized solutions like

**vectorized implementation**for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.--------------------------------------------------------------------------------

Click here to see solutions for all **Machine Learning**Coursera Assignments.

&

Click here to see more codes for **Raspberry Pi 3**and similar Family.

&

Click here to see more codes for **NodeMCU ESP8266**and similar Family.

&

Click here to see more codes for **Arduino Mega (ATMega 2560)**and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,

**-Akshay P. Daga**

hey!

ReplyDeleteIn predict.m file theta should be = 25*401 not 26*401;

wrong:

% theta dimensions = S_(j+1) x ((S_j)+1)

% theta1 = 26 x 401

% theta2 = 10 x 26

correct:

% theta dimensions = S_(j+1) x ((S_j)+1)

% theta1 = 25 x 401

% theta2 = 10 x 26

correct me.If I am wrong.

DeleteThanks Bhupesh.

DeleteYou are right.

predict.m is not working

ReplyDeleteWhat error you are getting?

DeleteHey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m

ReplyDeleteHi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction

ReplyDeleteMentioned error says there is some matrix dimension mismatch in variable op1 & op2.

DeleteI don't see any variables as op1 & op2 in my code.

Please check once again.

Sigmoid function is missing in predictOneVsAll

ReplyDeleteSigmoid is not used as we need to get the maximum value of Theta*x

Deleteas h(x) =Sigmoid(1/(1+e^theta*x)).

this E (0,1)

To predict the value to highest we need theta*x as maximum.

Hence sigmoid is not used.

will you please tell me what is t here?

ReplyDelete@(t)(lrCostFunction(t, X, (y == c), lambda)

why do to separate grad into two line? like seen below

ReplyDeletegrad(1) = (1/m) * (X(:,1)'*(h_x-y));

grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

Just writing it as

grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;

works fine or am i missing something here?

As per the theory, we don't do regularization for first term. and we apply regularization from 2nd term onward. that's why we have to do it separately.

DeleteWatch the related theory video once again carefully.

Thankyou for your help it's really great of you , i just wanted to know 2 things

ReplyDelete(1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

(2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'

Hi Rohan,

Delete(1) I think you should understand the problem first, then try to solve it your way. and if stuck in between or couldn't understand the problem then only you should check out my code for understanding purpose and then start solving your assignment. (Please don't just copy paste the code as it is)

(2) In predict function, we calculate probability for each class (for multi-class problem) then find out the maximum probability.

"prob" variable has value of probability and "p" variable has index of probability.

more the probability means more matching. then we use variable "p" to represent predicted class (category). which is nothing but the index of the maximum probability (prob).

I hope, I made it clear. If you still find it difficult to understand, please go through the theory lecture once again.

absolutely clear , thanks for the support

Delete