## Recent Posts ▸ One-vs-all logistic regression and neural networks to recognize hand-written digits.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.
But, In case you stuck in between, feel free to refer to the solutions provided by me.

#### NOTE:

Don't just copy paste the code for the sake of completion.
Even if you copy the code, make sure you understand the code first.

Click here to check out week-3 assignment solutions, Scroll down for the solutions for week-4 assignment.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

It consist of the following files:
• ex3.m - Octave/MATLAB script that steps you through part 1
• ex3 nn.m - Octave/MATLAB script that steps you through part 2
• ex3data1.mat - Training set of hand-written digits
• ex3weights.mat - Initial weights for the neural network exercise
• submit.m - Submission script that sends your solutions to our servers
• displayData.m - Function to help visualize the dataset
• fmincg.m - Function minimization routine (similar to fminunc)
• sigmoid.m - Sigmoid function
• Logistic regression cost function
• Train a one-vs-all multi-class classifier
• Predict using a one-vs-all multi-class classifier
• [*] predict.m - Neural network prediction function
• YouTube videos featuring Free IOT/ML tutorials
* indicates files you will need to complete

### lrCostFunction.m :

```function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           temp = theta;
%           temp(1) = 0;   % because we don't add anything for j = 0
%

%DIMENSIONS:
%   theta = (n+1) x 1
%   X     = m x (n+1)
%   y     = m x 1
%   grad  = (n+1) x 1
%   J     = Scalar

z   = X * theta;   % m x 1
h_x = sigmoid(z);  % m x 1

reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);

J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

grad(1) = (1/m) * (X(:,1)'*(h_x-y));                                    % 1 x 1
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);  % n x 1

% =============================================================

end```

### oneVsAll.m :

```function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds
%   to the classifier for label i

% num_labels = No. of output classifier (Here, it is 10)

% Some useful variables
m = size(X, 1);        % No. of Training Samples == No. of Images : (Here, 5000)
n = size(X, 2);        % No. of features == No. of pixels in each Image : (Here, 400)

% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
%DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401

%DIMENSIONS: X = m x input_layer_size
%Here, 1 row in X represents 1 training Image of pixel 20x20

% Add ones to the X data matrix
X = [ones(m, 1) X];   %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1)

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
%
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%

initial_theta = zeros(n+1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);

for c=1:num_labels
all_theta(c,:) = ...
fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
end

% =========================================================================
end```

### predictOneVsAll.m :

```function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples)

m = size(X, 1);     % No. of Input Examples to Predict (Each row = 1 Example)
num_labels = size(all_theta, 1); %No. of Ouput Classifier

% You need to return the following variables correctly
p = zeros(size(X, 1), 1);    % No_of_Input_Examples x 1 == m x 1

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the
%       are in rows, then, you can use max(A, [], 2) to obtain the max
%       for each row.
%
% num_labels = No. of output classifier (Here, it is 10)
% DIMENSIONS:
% all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1)

prob_mat = X * all_theta';     % 5000 x 10 == no_of_input_image x num_labels
[prob, p] = max(prob_mat,[],2); % m  x 1
%returns maximum element in each row  == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output

%%%%%%%% WORKING: Computation per input image %%%%%%%%%
% for i = 1:m                               % To iterate through each input sample
%     one_image = X(i,:);                   % 1 x 401 == 1 x no_of_features
%     prob_mat = one_image * all_theta';    % 1 x 10  == 1 x num_labels
%     [prob, out] = max(prob_mat);
%     %out: predicted output
%     %prob: probability of predicted output
%     p(i) = out;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%% WORKING %%%%%%%%%
% for i = 1:m
%     RX = repmat(X(i,:),num_labels,1);
%     RX = RX .* all_theta;
%     SX = sum(RX,2);
%     [val, index] = max(SX);
%     p(i) = index;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%
% =========================================================================
end```

### predict.m :

```function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly
p = zeros(size(X, 1), 1);  % m x 1

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%
%DIMENSIONS:
% theta1 = 25 x 401
% theta2 = 10 x 26

% layer1 (input)  = 400 nodes + 1bias
% layer2 (hidden) = 25 nodes + 1bias
% layer3 (output) = 10 nodes
%
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26

% theta1:
%     1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2
%     2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2
%     and
%     1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2
%     2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2
%
% theta2:
%     1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3
%     2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3
%     and
%     1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3
%     2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3

a1 = [ones(m,1) X]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X
%No. of rows = no. of input images
%No. of Column = No. of features in each image

z2 = a1 * Theta1';  % 5000 x 25
a2 = sigmoid(z2);   % 5000 x 25

a2 =  [ones(size(a2,1),1) a2];  % 5000 x 26

z3 = a2 * Theta2';  % 5000 x 10
a3 = sigmoid(z3);  % 5000 x 10

[prob, p] = max(a3,[],2);
%returns maximum element in each row  == max. probability and its index for each input image
%p: predicted output (index)
%prob: probability of predicted output

% =========================================================================
end```

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,
-Akshay P. Daga

1. hey!
In predict.m file theta should be = 25*401 not 26*401;

wrong:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 26 x 401
% theta2 = 10 x 26
correct:
% theta dimensions = S_(j+1) x ((S_j)+1)
% theta1 = 25 x 401
% theta2 = 10 x 26

1. correct me.If I am wrong.

2. Thanks Bhupesh.
You are right.

2. predict.m is not working

1. What error you are getting?

3. Hey, could you explain how "[prob, p] = max(a3,[],2);" is working in predict.m

4. Hi Iam getting error =: nonconformant arguments (op1 is 1x1, op2 is 1x2) at line using the code grad(1) = (1/m) * (X(:,1)'*(h_x-y)); in IrCostFunction

1. Mentioned error says there is some matrix dimension mismatch in variable op1 & op2.
I don't see any variables as op1 & op2 in my code.

2. Hi Akshay
I am having the same problem too when trying to submit my solutions. The error message is:
!! Submission failed: product: nonconformant arguments (op1 is 20x3, op2 is 3x1)
Function: lrCostFunction
LineNumber: 46
Appreciate your help to troubleshoot this? Thanks

3. I got the same error and after I have figured it out. It is because of wrong implementation of sigmoid. you might have writing code as g = 1/(1+exp(-z)) but z can be matrix so operation should be element wise. find out correct implementation.

ex = exp(z.*(-1));
din = 1.+ex;
g = 1./din;

5. Sigmoid function is missing in predictOneVsAll

1. Sigmoid is not used as we need to get the maximum value of Theta*x
as h(x) =Sigmoid(1/(1+e^theta*x)).
this E (0,1)
To predict the value to highest we need theta*x as maximum.
Hence sigmoid is not used.

6. will you please tell me what is t here?

@(t)(lrCostFunction(t, X, (y == c), lambda)

7. why do to separate grad into two line? like seen below
grad(2:end) = (1/m) * (X(:,2:end)'*(h_x-y)) + (lambda/m)*theta(2:end);

Just writing it as
grad = (1/m) * (X'*(h_x-y)) + (lambda/m)*theta;
works fine or am i missing something here?

1. As per the theory, we don't do regularization for first term. and we apply regularization from 2nd term onward. that's why we have to do it separately.

Watch the related theory video once again carefully.

8. Thankyou for your help it's really great of you , i just wanted to know 2 things

(1) always i start with an programming assignment i get really confused and dont understand where and how to start , so i first refer to your code understand it thoroughly and proceed with the assignment , i wanted to know how correct it is to do

(2) why have we used [prob , p] and and what are it's further intuations in the code , i mean why have we used 2 variables 'prob' & 'p'

1. Hi Rohan,
(1) I think you should understand the problem first, then try to solve it your way. and if stuck in between or couldn't understand the problem then only you should check out my code for understanding purpose and then start solving your assignment. (Please don't just copy paste the code as it is)

(2) In predict function, we calculate probability for each class (for multi-class problem) then find out the maximum probability.
"prob" variable has value of probability and "p" variable has index of probability.
more the probability means more matching. then we use variable "p" to represent predicted class (category). which is nothing but the index of the maximum probability (prob).

I hope, I made it clear. If you still find it difficult to understand, please go through the theory lecture once again.

2. absolutely clear , thanks for the support

9. Hi Akshay

Thanks for creating this amazing forum for us like minded people. Had a couple of queries:

1. Am not able to understand the variables of fmincg function (despite of using 'help'. It would be great if someone could help me with the same !

2. What do the three dots (...) in the line preceding the fmincg function specify ? Why are they needed ? (tried running the function without them but it pointed out as syntax error !

1. Thank you very much for your appreciation.
1. fmincg is explained a little bit in theory lecture. (Honestly, Even I have to check it in details)

2. Three dots (...) are nothing but "Lin Continuation character" in MATLAB.
DESCRIPTION: Three or more periods at the end of a line continues the current command on the next line. If three or more periods occur before the end of a line, then MATLAB ignores the rest of the line and continues to the next line. This effectively makes a comment out of anything on the current line that follows the three periods.

10. None of the coed are working, getting 0/100

1. Hi Qwert123, I think you are doing something wrong. Because the codes were 100% working for me and they are still working for many of my viewers. (you can get idea from comments).
And anyways, these codes are just for understanding. Get the idea from the above codes and make your own solution and try to submit.
Thank you.

11. how were you able to solve onevsall.m predictOneVsAll.m and predict.m bc i am trying to understand the problem and i am not getting how should i solve it

12. Can anyone explain what "theta_t" is? Why and how they coose some random value "[-2; -1; 1; 2]" (in ex.m).

1. 2. Sorry, I don't see any "theta_t" in my code.

13. Hi Akshay ,
It is showing error as unprecedented parameter name 'GrabObj'

14. Hi Akshay,
In OneVsall.m,it is saying IrCostFunction is undefined.
Why is it so?

15. Hello,

Can you help me resolve this
octave:7> oneVsAll.m
error: 'X' undefined near line 11 column 10
error: called from
oneVsAll at line 11 column 3

1. Instead of running oneVsAll.m file, please run the (.m) file in which all above function are called. Don't run those individual (.m) files in which the functions are defined.

16. Hi..... I used same to same implementation but the cost of my set is coming out to be 45.73 in contrast to the expected cost of 2.53.
I am using the same logic as yours but I dont know why is this happening.
Can you plz help me out?

1. Did you find the solution? Cos am having the same problem here.

2. I found the solution. His vectorizing formulas are wrong. He needed to use scalar multipication in some of them. Try the code below. It works %100

z = X * theta; % m x 1
h_x = sigmoid(z); % m x 1

reg_term = (lambda/(2*m)) .* sum(theta(2:end).^2);

J = (1/m).*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

grad(1) = (1/m). * (X(:,1)'*(h_x-y)); % 1 x 1
grad(2:end) = (1/m). * (X(:,2:end)'*(h_x-y)) + (lambda/m).*theta(2:end); % n x 1

3. @Ozan Kocabs All vectorized implemented formulas provided by me are 100% right.

When you multiply a scalar (constant) with any matrix, you don't have to use ".*" (dot star), only "*" (star) is enough to multiply all the elements of the matrix by that constant.

You might have some other mistake which caused the different cost value.
Please check and find out the correct root cause of your problem.

NOTE: For 2nd check, I ran my code once again and tested it just now and it is giving the correct output.
...
Testing lrCostFunction() with regularization
Cost: 2.534819
Expected cost: 2.534819
...

4. I dont know why it resulted in 5 different values in my results. It was like 5x1 matrice all resulting 45,73 and after i put some scalar multipication problem solved. I have just used your code once again and it worked. U are right. But i dont know why it didnt work at first. Thanks you mate. You are a life saver:)

17. Hi Akshay,
I have used the same code as yours in predict.m
Within the exercise code i am getting training exercise accuracy as expected (97.5%). Also the digit is also being recognized correctly.

But when i am submitting the code for grading, i am getting the following error:

!! Submission failed: unexpected error: Index exceeds the number of array elements (16).

Thanks in advance for the help.

1. Please compare your code with the one given above and check if the dimensions are matching or not. Please use the comments given in each in above code. That will help you understand what that particular line of code signifies.

18. Could you please explain the line all_theta(c,:) = ... in onevsall. I got stuck for this an hour

19. I dont know , i am getting iteration and cost on output console here i am posting some of them. Please help as i am stuck there for more than one day.

Iteration 16 | Cost: 1.018509e-01
Iteration 17 | Cost: 1.018509e-01
Iteration 18 | Cost: 1.018509e-01
Iteration 19 | Cost: 1.018509e-01
Iteration 20 | Cost: 1.018509e-01
Iteration 21 | Cost: 1.018509e-01
Iteration 22 | Cost: 1.018509e-01
Iteration 23 | Cost: 1.018509e-01
Iteration 24 | Cost: 1.018509e-01
Iteration 25 | Cost: 1.018509e-01
Iteration 26 | Cost: 1.018509e-01

all_theta =

-0.5595 0.6192 -0.5504 -0.0935
-5.4744 -0.4716 1.2613 0.6349
0.0684 -0.3756 -1.6523 -1.4101

1. missing ';' in code?

20. H = sigmoid(X*theta);
T = y.*log(H) + (1 - y).*log(1 - H);
J = -1/m*sum(T) + lambda/(2*m)*sum(theta(2:end).^2);

ta = [0; theta(2:end)];
grad = X'*(H - y)/m + lambda/m*ta;

but im getting this error:

>> lrCostFunction
Not enough input arguments.

Error in lrCostFunction (line 9)
m = length(y); % number of training examples

I try using your code to check if i was wrong but i got the same error could you help me? please

21. Hey, I have question and that is when we were calculating grad in week 3 assignment we include
1. 