▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

####

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

###

###

###

###

###

I tried to provide optimized solutions like

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

**These solutions are for reference only.**

**> It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.**

**>**

**But, In case you stuck in between, feel free to refer to the solutions provided by me.**

####
**NOTE:**

Don't just copy paste the code for the sake of completion.

Even if you copy the code, make sure you understand the code first.

**Click here to check out**

__week-3__assignment solutions,__Scroll down__for the solutions for__week-4__assignment.In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

Recommended Machine Learning Courses:

- Coursera: Machine Learning
- Coursera: Deep Learning Specialization
- Coursera: Machine Learning with Python
- Coursera: Advanced Machine Learning Specialization
- Udemy: Machine Learning
- LinkedIn: Machine Learning
- Eduonix: Machine Learning
- edX: Machine Learning
- Fast.ai: Introduction to Machine Learning for Coders

**It consist of the following files:**

**ex3.m -**Octave/MATLAB script that steps you through part 1**ex3 nn.m -**Octave/MATLAB script that steps you through part 2**ex3data1.mat -**Training set of hand-written digits**ex3weights.mat -**Initial weights for the neural network exercise**submit.m -**Submission script that sends your solutions to our servers**displayData.m -**Function to help visualize the dataset**fmincg.m -**Function minimization routine (similar to fminunc)**sigmoid.m -**Sigmoid function**[*] lrCostFunction.m -**Logistic regression cost function**[*] oneVsAll.m -**Train a one-vs-all multi-class classifier**[*] predictOneVsAll.m -**Predict using a one-vs-all multi-class classifier**[*] predict.m -**Neural network prediction function**Video -**YouTube videos featuring Free IOT/ML tutorials

*****indicates files you will need to complete

###
**lrCostFunction.m :**

```
function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
% efficiently vectorized. For example, consider the computation
%
% sigmoid(X * theta)
%
% Each row of the resulting matrix will contain the value of the
% prediction for that example. You can make use of this to vectorize
% the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
% there're many possible vectorized solutions, but one solution
% looks like:
% grad = (unregularized gradient for logistic regression)
% temp = theta;
% temp(1) = 0; % because we don't add anything for j = 0
% grad = grad + YOUR_CODE_HERE (using the temp variable)
%
%DIMENSIONS:
% theta = (n+1) x 1
% X = m x (n+1)
% y = m x 1
% grad = (n+1) x 1
% J = Scalar
z = X * theta; % m x 1
h_x = sigmoid(z); % m x 1
reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
grad(1) = (1/m) * (X(:,1)'*(h_x-y)); % 1 x 1
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end); % n x 1
% =============================================================
grad = grad(:);
end
```

###
**oneVsAll.m :**

```
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
% logistic regression classifiers and returns each of these classifiers
% in a matrix all_theta, where the i-th row of all_theta corresponds
% to the classifier for label i
% num_labels = No. of output classifier (Here, it is 10)
% Some useful variables
m = size(X, 1); % No. of Training Samples == No. of Images : (Here, 5000)
n = size(X, 2); % No. of features == No. of pixels in each Image : (Here, 400)
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
%DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401
%DIMENSIONS: X = m x input_layer_size
%Here, 1 row in X represents 1 training Image of pixel 20x20
% Add ones to the X data matrix
X = [ones(m, 1) X]; %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
% logistic regression classifiers with regularization
% parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
% whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
% function. It is okay to use a for-loop (for c = 1:num_labels) to
% loop over the different classes.
%
% fmincg works similarly to fminunc, but is more efficient when we
% are dealing with large number of parameters.
%
% Example Code for fmincg:
%
% % Set Initial theta
% initial_theta = zeros(n + 1, 1);
%
% % Set options for fminunc
% options = optimset('GradObj', 'on', 'MaxIter', 50);
%
% % Run fmincg to obtain the optimal theta
% % This function will return theta and the cost
% [theta] = ...
% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
% initial_theta, options);
%
initial_theta = zeros(n+1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c=1:num_labels
all_theta(c,:) = ...
fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
end
% =========================================================================
end
```

###
**predictOneVsAll.m :**

```
function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).
% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
% for each example in the matrix X. Note that X contains the examples in
% rows. all_theta is a matrix where the i-th row is a trained logistic
% regression theta vector for the i-th class. You should set p to a vector
% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
% for 4 examples)
m = size(X, 1); % No. of Input Examples to Predict (Each row = 1 Example)
num_labels = size(all_theta, 1); %No. of Ouput Classifier
% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % No_of_Input_Examples x 1 == m x 1
% Add ones to the X data matrix
X = [ones(m, 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters (one-vs-all).
% You should set p to a vector of predictions (from 1 to
% num_labels).
%
% Hint: This code can be done all vectorized using the max function.
% In particular, the max function can also return the index of the
% max element, for more information see 'help max'. If your examples
% are in rows, then, you can use max(A, [], 2) to obtain the max
% for each row.
%
% num_labels = No. of output classifier (Here, it is 10)
% DIMENSIONS:
% all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)
prob_mat = X * all_theta'; % 5000 x 10 == no_of_input_image x num_labels
[prob, p] = max(prob_mat,[],2); % m x 1
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output
%%%%%%%% WORKING: Computation per input image %%%%%%%%%
% for i = 1:m % To iterate through each input sample
% one_image = X(i,:); % 1 x 401 == 1 x no_of_features
% prob_mat = one_image * all_theta'; % 1 x 10 == 1 x num_labels
% [prob, out] = max(prob_mat);
% %out: predicted output
% %prob: probability of predicted output
% p(i) = out;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% WORKING %%%%%%%%%
% for i = 1:m
% RX = repmat(X(i,:),num_labels,1);
% RX = RX .* all_theta;
% SX = sum(RX,2);
% [val, index] = max(SX);
% p(i) = index;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%
% =========================================================================
end
```

###
**Check-out our free tutorials on IOT (Internet of Things):**

###
**predict.m :**

```
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1); % m x 1
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
%DIMENSIONS:
% theta1 = 25 x 401
% theta2 = 10 x 26
% layer1 (input) = 400 nodes + 1bias
% layer2 (hidden) = 25 nodes + 1bias
% layer3 (output) = 10 nodes
%
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26
% theta1:
% 1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
% 2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
% and
% 1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
% 2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
%
% theta2:
% 1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
% 2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
% and
% 1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
% 2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3
a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X
%No. of rows = no. of input images
%No. of Column = No. of features in each image
z2 = a1 * Theta1'; % 5000 x 25
a2 = sigmoid(z2); % 5000 x 25
a2 = [ones(size(a2,1),1) a2]; % 5000 x 26
z3 = a2 * Theta2'; % 5000 x 10
a3 = sigmoid(z3); % 5000 x 10
[prob, p] = max(a3,[],2);
%returns maximum element in each row == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output
% =========================================================================
end
```

I tried to provide optimized solutions like

**vectorized implementation**for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------

Click here to see solutions for all **Machine Learning**Coursera Assignments.

&

Click here to see more codes for **Raspberry Pi 3**and similar Family.

&

Click here to see more codes for **NodeMCU ESP8266**and similar Family.

&

Click here to see more codes for **Arduino Mega (ATMega 2560)**and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,

**-Akshay P. Daga**

hey!

ReplyDeleteIn predict.m file theta should be = 25*401 not 26*401;

wrong:

% theta dimensions = S_(j+1) x ((S_j)+1)

% theta1 = 26 x 401

% theta2 = 10 x 26

correct:

% theta dimensions = S_(j+1) x ((S_j)+1)

% theta1 = 25 x 401

% theta2 = 10 x 26

correct me.If I am wrong.

DeleteThanks Bhupesh.

DeleteYou are right.

Hi Akshay

DeleteI still did not understand how we arrived at the theta sizes. We only know the activation nodes in the first layer = 400 and in the last layer (output) = 10.

We have no information relating to the second layer.

Can you please elaborate?

Thanks

@Unknown details for the layer2 is given in the question itself. I have also mentioned in the comments in code as below. (please read the question carefully once again.)

Delete% layer1 (input) = 400 nodes + 1bias

% layer2 (hidden) = 25 nodes + 1bias

% layer3 (output) = 10 nodes

Got it, thanks very much.

Deletepredict.m is not working

ReplyDeleteWhat error you are getting?

DeleteHey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m

ReplyDeleteHi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction

ReplyDeleteMentioned error says there is some matrix dimension mismatch in variable op1 & op2.

DeleteI don't see any variables as op1 & op2 in my code.

Please check once again.

Hi Akshay

DeleteI am having the same problem too when trying to submit my solutions. The error message is:

!! Submission failed: product: nonconformant arguments (op1 is 20x3, op2 is 3x1)

Function: lrCostFunction

LineNumber: 46

Appreciate your help to troubleshoot this? Thanks

I got the same error and after I have figured it out. It is because of wrong implementation of sigmoid. you might have writing code as g = 1/(1+exp(-z)) but z can be matrix so operation should be element wise. find out correct implementation.

Deleteex = exp(z.*(-1));

din = 1.+ex;

g = 1./din;

Sigmoid function is missing in predictOneVsAll

ReplyDeleteSigmoid is not used as we need to get the maximum value of Theta*x

Deleteas h(x) =Sigmoid(1/(1+e^theta*x)).

this E (0,1)

To predict the value to highest we need theta*x as maximum.

Hence sigmoid is not used.

will you please tell me what is t here?

ReplyDelete@(t)(lrCostFunction(t, X, (y == c), lambda)

why do to separate grad into two line? like seen below

ReplyDeletegrad(1) = (1/m) * (X(:,1)'*(h_x-y));

grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

Just writing it as

grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;

works fine or am i missing something here?

As per the theory, we don't do regularization for first term. and we apply regularization from 2nd term onward. that's why we have to do it separately.

DeleteWatch the related theory video once again carefully.

Thankyou for your help it's really great of you , i just wanted to know 2 things

ReplyDelete(1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

(2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'

Hi Rohan,

Delete(1) I think you should understand the problem first, then try to solve it your way. and if stuck in between or couldn't understand the problem then only you should check out my code for understanding purpose and then start solving your assignment. (Please don't just copy paste the code as it is)

(2) In predict function, we calculate probability for each class (for multi-class problem) then find out the maximum probability.

"prob" variable has value of probability and "p" variable has index of probability.

more the probability means more matching. then we use variable "p" to represent predicted class (category). which is nothing but the index of the maximum probability (prob).

I hope, I made it clear. If you still find it difficult to understand, please go through the theory lecture once again.

absolutely clear , thanks for the support

DeleteHi Akshay

ReplyDeleteThanks for creating this amazing forum for us like minded people. Had a couple of queries:

1. Am not able to understand the variables of fmincg function (despite of using 'help'. It would be great if someone could help me with the same !

2. What do the three dots (...) in the line preceding the fmincg function specify ? Why are they needed ? (tried running the function without them but it pointed out as syntax error !

Thanks in advance.

Thank you very much for your appreciation.

Delete1. fmincg is explained a little bit in theory lecture. (Honestly, Even I have to check it in details)

2. Three dots (...) are nothing but "Lin Continuation character" in MATLAB.

DESCRIPTION: Three or more periods at the end of a line continues the current command on the next line. If three or more periods occur before the end of a line, then MATLAB ignores the rest of the line and continues to the next line. This effectively makes a comment out of anything on the current line that follows the three periods.

None of the coed are working, getting 0/100

ReplyDeleteHi Qwert123, I think you are doing something wrong. Because the codes were 100% working for me and they are still working for many of my viewers. (you can get idea from comments).

DeleteAnd anyways, these codes are just for understanding. Get the idea from the above codes and make your own solution and try to submit.

Thank you.

how were you able to solve onevsall.m predictOneVsAll.m and predict.m bc i am trying to understand the problem and i am not getting how should i solve it

ReplyDeleteCan anyone explain what "theta_t" is? Why and how they coose some random value "[-2; -1; 1; 2]" (in ex.m).

ReplyDelete*choose

DeleteSorry, I don't see any "theta_t" in my code.

DeleteHi Akshay ,

ReplyDeleteIt is showing error as unprecedented parameter name 'GrabObj'

Hi Akshay,

ReplyDeleteIn OneVsall.m,it is saying IrCostFunction is undefined.

Why is it so?

Hello,

ReplyDeleteCan you help me resolve this

octave:7> oneVsAll.m

error: 'X' undefined near line 11 column 10

error: called from

oneVsAll at line 11 column 3

Instead of running oneVsAll.m file, please run the (.m) file in which all above function are called. Don't run those individual (.m) files in which the functions are defined.

DeleteHi..... I used same to same implementation but the cost of my set is coming out to be 45.73 in contrast to the expected cost of 2.53.

ReplyDeleteI am using the same logic as yours but I dont know why is this happening.

Can you plz help me out?

Did you find the solution? Cos am having the same problem here.

DeleteI found the solution. His vectorizing formulas are wrong. He needed to use scalar multipication in some of them. Try the code below. It works %100

Deletez = X * theta; % m x 1

h_x = sigmoid(z); % m x 1

reg_term = (lambda/(2*m)) .* sum(theta(2:end).^2);

J = (1/m).*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

grad(1) = (1/m). * (X(:,1)'*(h_x-y)); % 1 x 1

grad(2:end) = (1/m). * (X(:,2:end)'*(h_x-y)) + (lambda/m).*theta(2:end); % n x 1

@Ozan Kocabs All vectorized implemented formulas provided by me are 100% right.

DeleteWhen you multiply a scalar (constant) with any matrix, you don't have to use ".*" (dot star), only "*" (star) is enough to multiply all the elements of the matrix by that constant.

You might have some other mistake which caused the different cost value.

Please check and find out the correct root cause of your problem.

NOTE: For 2nd check, I ran my code once again and tested it just now and it is giving the correct output.

...

Testing lrCostFunction() with regularization

Cost: 2.534819

Expected cost: 2.534819

...

I dont know why it resulted in 5 different values in my results. It was like 5x1 matrice all resulting 45,73 and after i put some scalar multipication problem solved. I have just used your code once again and it worked. U are right. But i dont know why it didnt work at first. Thanks you mate. You are a life saver:)

DeleteHi Akshay,

ReplyDeleteI have used the same code as yours in predict.m

Within the exercise code i am getting training exercise accuracy as expected (97.5%). Also the digit is also being recognized correctly.

But when i am submitting the code for grading, i am getting the following error:

!! Submission failed: unexpected error: Index exceeds the number of array elements (16).

!! Please try again later.

Thanks in advance for the help.

Please compare your code with the one given above and check if the dimensions are matching or not. Please use the comments given in each in above code. That will help you understand what that particular line of code signifies.

DeleteCould you please explain the line all_theta(c,:) = ... in onevsall. I got stuck for this an hour

ReplyDeleteI dont know , i am getting iteration and cost on output console here i am posting some of them. Please help as i am stuck there for more than one day.

ReplyDeleteIteration 16 | Cost: 1.018509e-01

Iteration 17 | Cost: 1.018509e-01

Iteration 18 | Cost: 1.018509e-01

Iteration 19 | Cost: 1.018509e-01

Iteration 20 | Cost: 1.018509e-01

Iteration 21 | Cost: 1.018509e-01

Iteration 22 | Cost: 1.018509e-01

Iteration 23 | Cost: 1.018509e-01

Iteration 24 | Cost: 1.018509e-01

Iteration 25 | Cost: 1.018509e-01

Iteration 26 | Cost: 1.018509e-01

all_theta =

-0.5595 0.6192 -0.5504 -0.0935

-5.4744 -0.4716 1.2613 0.6349

0.0684 -0.3756 -1.6523 -1.4101

missing ';' in code?

DeleteHi could you please help me? this is my code on lrcostfunction:

ReplyDeleteH = sigmoid(X*theta);

T = y.*log(H) + (1 - y).*log(1 - H);

J = -1/m*sum(T) + lambda/(2*m)*sum(theta(2:end).^2);

ta = [0; theta(2:end)];

grad = X'*(H - y)/m + lambda/m*ta;

but im getting this error:

>> lrCostFunction

Not enough input arguments.

Error in lrCostFunction (line 9)

m = length(y); % number of training examples

I try using your code to check if i was wrong but i got the same error could you help me? please

Hey, I have question and that is when we were calculating grad in week 3 assignment we include

ReplyDeletegrad(1) = (1/m)* sum(X(:,1)'*(hx-y));

grad(2:end) = (1/m)* sum(X(:,2:end)'*(hx-y))+(lambda/m)*theta(2:end);

Now, when we calculate in week 4 we remove "sum" in both equations, my question is why we remove sum and when I calculate with sum it's provides wrong answer.

I don't see any sum function used in calculating grad even in assignment 3.

DeleteHere is the link for assignment 3 solution- https://www.apdaga.com/2018/06/coursera-machine-learning-week-3.html#costFunctionReg

Please check it out.

Hi

ReplyDeletefor the oneVsAll.m problem, how would the code look like if you don't use the fmincg function,

I'm kinda lost on the process of how to get all_theta

can you send submit.m and submit confg file of the of this experiment

ReplyDeletei am getting error at predict.m file

ReplyDeleteerror: called from

predict at line 7 column 5

it might be some silly mistake near line 6 or 7. Please check.

DeleteYou will resolve it yourself.

I am trying to submit the whole package. All scripts so far are running and give me the correct answer, but when I submit to the test servers, I get an error on the size of a matrix

ReplyDelete!! Submission failed: unexpected error: Matrix dimensions must agree.

!! Please try again later.

How can I fix this error? Thanks

In predict.m why do we have to do a1 * Theta1' instead of Theta1 * a1'?

ReplyDelete