Coursera: Machine Learning (Week 8) [Assignment Solution] - Andrew NG

 K-means clustering algorithm to compress an image.
 Principal component analysis to find a low-dimensional representation of face images.

I have recently completed the Machine Learning course from Coursera by Andrew NG.

While doing the course we have to go through various quiz and assignments.

Here, I am sharing my solutions for the weekly assignments throughout the course.

These solutions are for reference only.

It is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course.
But, In case you stuck in between, feel free to refer to the solutions provided by me.

NOTE:

Don't just copy paste the code for the sake of completion. 
Even if you copy the code, make sure you understand the code first.

Click here to check out week-7 assignment solutions, Scroll down for the solutions for week-8 assignment.




In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to find a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.


It consist of the following files:
  • ex7.m - Octave/MATLAB script for the first exercise on K-means
  • ex7 pca.m - Octave/MATLAB script for the second exercise on PCA
  • ex7data1.mat - Example Dataset for PCA
  • ex7data2.mat - Example Dataset for K-means
  • ex7faces.mat - Faces Dataset
  • bird small.png - Example Image
  • displayData.m - Displays 2D data stored in a matrix
  • drawLine.m - Draws a line over an exsiting figure
  • plotDataPoints.m - Initialization for K-means centroids
  • plotProgresskMeans.m - Plots each step of K-means as it proceeds
  • runkMeans.m - Runs the K-means algorithm
  • submit.m - Submission script that sends your solutions to our servers
  • [*] pca.m - Perform principal component analysis
  • [*] projectData.m - Projects a data set into a lower dimensional space
  • [*] recoverData.m - Recovers the original data from the projection
  • [*] findClosestCentroids.m - Find closest centroids (used in K-means)
  • [*] computeCentroids.m - Compute centroid means (used in K-means)
  • [*] kMeansInitCentroids.m - Initialization for K-means centroids
  • Video - YouTube videos featuring Free IOT/ML tutorials

* indicates files you will need to complete

pca.m :

function [U, S] = pca(X)
  %PCA Run principal component analysis on the dataset X
  %   [U, S, X] = pca(X) computes eigenvectors of the covariance matrix of X
  %   Returns the eigenvectors U, the eigenvalues (on diagonal) in S
  %
  
  % Useful values
  [m, n] = size(X);
  
  % You need to return the following variables correctly.
  U = zeros(n);
  S = zeros(n);
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: You should first compute the covariance matrix. Then, you
  %               should use the "svd" function to compute the eigenvectors
  %               and eigenvalues of the covariance matrix. 
  %
  % Note: When computing the covariance matrix, remember to divide by m (the
  %       number of examples).
  %
  % DIMENSIONS:
  %    X = m x n
  
  Sigma = (1/m)*(X'*X); % n x n
  [U, S, V] = svd(Sigma);
  
  % =========================================================================
end




projectData.m :

function Z = projectData(X, U, K)
  %PROJECTDATA Computes the reduced data representation when projecting only 
  %on to the top k eigenvectors
  %   Z = projectData(X, U, K) computes the projection of 
  %   the normalized inputs X into the reduced dimensional space spanned by
  %   the first K columns of U. It returns the projected examples in Z.
  %
  
  % You need to return the following variables correctly.
  Z = zeros(size(X, 1), K);
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the projection of the data using only the top K 
  %               eigenvectors in U (first K columns). 
  %               For the i-th example X(i,:), the projection on to the k-th 
  %               eigenvector is given as follows:
  %                    x = X(i, :)';
  %                    projection_k = x' * U(:, k);
  %
  % DIMENSIONS:
  %    X = m x n
  %    U = n x n
  %    U_reduce = n x K
  %    K = scalar
  
  U_reduce = U(:,[1:K]);   % n x K
  Z = X * U_reduce;        % m x k
  
  % =============================================================
end


recoverData.m :

function X_rec = recoverData(Z, U, K)
  %RECOVERDATA Recovers an approximation of the original data when using the 
  %projected data
  %   X_rec = RECOVERDATA(Z, U, K) recovers an approximation the 
  %   original data that has been reduced to K dimensions. It returns the
  %   approximate reconstruction in X_rec.
  %
  
  % You need to return the following variables correctly.
  X_rec = zeros(size(Z, 1), size(U, 1));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the approximation of the data by projecting back
  %               onto the original space using the top K eigenvectors in U.
  %
  %               For the i-th example Z(i,:), the (approximate)
  %               recovered data for dimension j is given as follows:
  %                    v = Z(i, :)';
  %                    recovered_j = v' * U(j, 1:K)';
  %
  %               Notice that U(j, 1:K) is a row vector.
  %               
  % DIMENSIONS: 
  %    Z = m x K
  %    U = n x n
  %    U_reduce = n x k
  %    K = scalar
  %    X_rec = m x n
  
  U_reduce = U(:,1:K);   % n x k
  X_rec = Z * U_reduce'; % m x n
  
  % =============================================================
end




findClosestCentroids.m :

function idx = findClosestCentroids(X, centroids)
  %FINDCLOSESTCENTROIDS computes the centroid memberships for every example
  %   idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
  %   in idx for a dataset X where each row is a single example. idx = m x 1
  %   vector of centroid assignments (i.e. each entry in range [1..K])
  %
  
  % Set K
  K = size(centroids, 1); % K x 1 == 3 x 1
  
  % You need to return the following variables correctly.
  idx = zeros(size(X,1), 1);  % m x 1 == 300 x 1
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Go over every example, find its closest centroid, and store
  %               the index inside idx at the appropriate location.
  %               Concretely, idx(i) should contain the index of the centroid
  %               closest to example i. Hence, it should be a value in the
  %               range 1..K
  %
  % Note: You can use a for-loop over the examples to compute this.
  %
  % DIMENSIONS:
  %    centroids = K x no. of features = 3 x 2
  
  for i = 1:size(X,1)
      temp = zeros(K,1);
      for j = 1:K
          temp(j)=sqrt(sum((X(i,:)-centroids(j,:)).^2));
      end
      [~,idx(i)] = min(temp);
  end
  
  % =============================================================
end

Check-out our free tutorials on IOT (Internet of Things):



computeCentroids.m :

function centroids = computeCentroids(X, idx, K)
  %COMPUTECENTROIDS returns the new centroids by computing the means of the
  %data points assigned to each centroid.
  %   centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by
  %   computing the means of the data points assigned to each centroid. It is
  %   given a dataset X where each row is a single data point, a vector
  %   idx of centroid assignments (i.e. each entry in range [1..K]) for each
  %   example, and K, the number of centroids. You should return a matrix
  %   centroids, where each row of centroids is the mean of the data points
  %   assigned to it.
  %
  
  % Useful variables
  [m n] = size(X);
  
  % You need to return the following variables correctly.
  centroids = zeros(K, n);
  
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Go over every centroid and compute mean of all points that
  %               belong to it. Concretely, the row vector centroids(i, :)
  %               should contain the mean of the data points assigned to
  %               centroid i.
  %
  % Note: You can use a for-loop over the centroids to compute this.
  %
  % DIMENSIONS:
  %    X =  m x n
  %    centroids = K x n
  
  %% %%%%%% WORKING: SOLUTION1 %%%%%%%%%
  % for i = 1:K
  %     idx_i = find(idx==i);       %indexes of all the input which belongs to cluster j
  %     centroids(i,:)=(1/length(idx_i))*sum(X(idx_i,:)); %calculating mean manually
  % end
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  %% %%%%%% WORKING: SOLUTION 2 %%%%%%%%
  for i = 1:K
      idx_i = find(idx==i);       %indexes of all the input which belongs to cluster j
      centroids(i,:) = mean(X(idx_i,:)); % calculating mean using built-in function
  end
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % =============================================================
end






kMeansInitCentroids.m :

function centroids = kMeansInitCentroids(X, K)
  %KMEANSINITCENTROIDS This function initializes K centroids that are to be 
  %used in K-Means on the dataset X
  %   centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be
  %   used with the K-Means on the dataset X
  %
  
  % You should return this values correctly
  centroids = zeros(K, size(X, 2));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: You should set centroids to randomly chosen examples from
  %               the dataset X
  %
  
  
  % Randomly reorder the indices of examples
  randidx = randperm(size(X, 1));
  
  % Take the first K examples as centroids
  centroids = X(randidx(1:K), :);
  
  % =============================================================
end

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

--------------------------------------------------------------------------------
Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks and Regards,
-Akshay P. Daga





37 Comments

  1. Hi there! Your site has been really helpful. I hope you continue this good work :)

    In addition, I do have a suggestion for a more vectorized implementation approach for computeCentroids.m:

    idx_vec = (1:K) == idx;
    centroids(1:K, :) = (idx_vec' * X)./(sum(idx_vec))';

    Cheers!

    ReplyDelete
    Replies
    1. Thanks for suggestion.
      Glad to know that my solution were helpful to you.

      I will try/implement your suggestion.

      Delete
  2. in findClosestCentroids
    [~, idx(i)]
    what does "~" means here?

    ReplyDelete
    Replies
    1. min function in MATLAB returns output as [minimum_value, index].
      Since we are only interested in the index value, We can ignore storing minimum_value to any variable.

      If you put the first place blank as [ ,idx(i)]. (To ignore assigning minimum_value to any variable), It will throw an error.

      So, it can be achieved by replacing the variable by "~" (tilde) character.
      It is also known as Argument Placeholder.

      Delete
  3. Replies
    1. It's really awesome course to begin studying Machine Learning. This course provide through concepts and enough hands-on practices.

      Delete
  4. Why are we summing up X values?
    sqrt((X(i,:)-centroids(j,:)).^2); --> Here

    ReplyDelete
    Replies
    1. To find closest centroid, we have to calculate euclidean distance and find the minimum value.
      and This is the formula for calculating Euclidean distance for every sample with Centroid.

      Delete
  5. Thank you for the solutions,they all were really helpful.

    ReplyDelete
  6. What is the meaning of this bro ? " [~,idx(i)]"
    I also want to know the difference between " [~,idx(i)]" and "idx(i)".
    aren't they same anyway as idx is single column vector?

    ReplyDelete
    Replies
    1. "~" is a placeholder. You can put any variable name instead of "~" but that is consume memory. But if you don't want use that value again anywhere then no need to use variable for that. So use can use "~" there.

      min function returns two values. 1st is the minimum value and 2nd is the index of the minimum value. Here, We want to use only 2nd value so we stored it in "idx(i)" variable and used "~" for 1st value as we don't need it for further use.

      Delete
  7. I am still getting an error at line 135 'ex7.m' saying there is an error in findClosestCentroids.m. Matrix Dimensions must agree.

    ReplyDelete
  8. A little shorter:
    function idx = findClosestCentroids(X, centroids)
    %FINDCLOSESTCENTROIDS computes the centroid memberships for every example
    % idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
    % in idx for a dataset X where each row is a single example. idx = m x 1
    % vector of centroid assignments (i.e. each entry in range [1..K])
    %

    % Set K
    K = size(centroids, 1);
    m = size(X,1);
    % You need to return the following variables correctly.
    idx = zeros(m, 1);

    % ====================== YOUR CODE HERE ======================
    % Instructions: Go over every example, find its closest centroid, and store
    % the index inside idx at the appropriate location.
    % Concretely, idx(i) should contain the index of the centroid
    % closest to example i. Hence, it should be a value in the
    % range 1..K
    %
    % Note: You can use a for-loop over the examples to compute this.
    %

    dist2centroid = zeros(m,K);

    for ik = 1 : K
    cent2matrix = repmat(centroids(ik,:),[m 1]);
    dist2centroid(:,ik) = sum((X - cent2matrix).^2,2);
    end
    [~,idx] = min(dist2centroid,[],2) ;

    % =============================================================

    end

    ReplyDelete
    Replies
    1. how's this code shorter than the given above in the blog post? It is almost same code.

      Delete
    2. "for"-loops are not optimized in Matlab, so eliminating "for" loops is always a goal in writing Matlab scripts. As a final version, would prefer everything in a single line: dist2centroid(:,ik) = sum((X - repmat(centroids(ik,:),[m 1])).^2,2) but it may be confusing for some viewers.

      Delete
    3. Ok. Thank you very much for your detailed explanation.

      Delete
    4. No, thank you for your effort. Great work!

      Delete
  9. Little shorter:
    function centroids = computeCentroids(X, idx, K)
    %COMPUTECENTROIDS returns the new centroids by computing the means of the
    %data points assigned to each centroid.
    % centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by
    % computing the means of the data points assigned to each centroid. It is
    % given a dataset X where each row is a single data point, a vector
    % idx of centroid assignments (i.e. each entry in range [1..K]) for each
    % example, and K, the number of centroids. You should return a matrix
    % centroids, where each row of centroids is the mean of the data points
    % assigned to it.
    %

    % You need to return the following variables correctly.
    centroids = zeros(K, size(X,2));

    % ====================== YOUR CODE HERE ======================
    % Instructions: Go over every centroid and compute mean of all points that
    % belong to it. Concretely, the row vector centroids(i, :)
    % should contain the mean of the data points assigned to
    % centroid i.
    %
    % Note: You can use a for-loop over the centroids to compute this.
    %
    for ik = 1:K
    centroids(ik,:) = mean(X(idx == ik,:));
    end

    % =============================================================

    end

    ReplyDelete
    Replies
    1. how's this code shorter than the given above in the blog post? It is almost same code.

      Delete
    2. The extra array-variable of dynamic length, "idx_i" is not created, and Matlab is quite sensitive to it. Also, "find" is not a logical operation, which is more expensive than "==" logical.

      Delete
    3. I got the following error after I use the same code as you posted:

      error: computeCentroids: =: nonconformant arguments (op1 is 1x3, op2 is 0x1)
      error: called from
      computeCentroids at line 30 column 21
      runkMeans at line 55 column 15
      ex7 at line 135 column 16

      Any ideas?

      Delete
  10. Thank you for your effort
    is it necessaty that we initialize temp = zeros(K,1); what is its purpose thank you

    ReplyDelete
    Replies
    1. If create a matrix in MATLAB and increase the size of that matrix in each iteration then internally MATLAB creates a new matrix of updated size and create older matrix. This is very costly operation.
      So, To solve the issue, the best practise is to create a Zero matrix of its maximum size and then update the values in each iteration.

      Delete
    2. Ah get it, thank you again

      Delete
  11. It really helped macha, Thank you.

    ReplyDelete
  12. I came up with this solution but I don't know where exactly is the error
    But maybe it would be helpful
    ---------------------------
    result = zeros(size(X,1), K);
    for i=1:K
    result(:,i) = (sum(X - centroids(i,:),2)).^2;
    endfor
    [minval, idx] = min(result, [], 2);

    ReplyDelete
    Replies
    1. this is regarding solution for findClosestCentroids

      Delete
  13. findClosestCentroids
    temp(j)=sqrt(sum((X(i,:)-centroids(j,:)).^2));
    why .^2 and then sqrt?? i tried deleting both, it worked

    ReplyDelete
    Replies
    1. okay i made a mistake, i understand it now,, thanks

      Delete
  14. do u know a vectorized implementation for computeCentroids?? thanks

    ReplyDelete
    Replies
    1. %%%%%% WORKING: SOLUTION 2 %%%%%%%% in computeCentroids code above is itself vectorized implementation. Don't get confused with the for loop used in that solution. K-mean is a iterative process/algorithm. That for loop is for k-mean iterations.

      Vectorized implementation means applying operations simultaneously on all data points instead of applying it only on one data point at a time and using for loop for iterating over all data points. (For loop in above code is different)

      Delete
  15. I'd like to share a completely vectorized version I came up with for computeCentroids.m:

    assigned = NaN(m, n*K);
    characteristic = logical(cat(3, repmat((idx == (1:K)), 1, n)));
    pool = reshape(repmat(X', 1, K)', m, n*K);
    assigned(characteristic) = pool(characteristic);
    centroids = reshape(mean(assigned, 1, "omitnan"), K, n);

    It's definitely worse than your solution both in terms of complexity and readability, but I thought it was interesting that the for loops can be completely eliminated. Maybe it can be optimized and improved on?

    ReplyDelete
    Replies
    1. Thank you very much for your solution. It's a new approach & will be helpful for others as well.

      Delete
  16. execution of a script as a function is not possible

    ReplyDelete
  17. when trying to run
    Sigma = (1/m)*(X'*X); % n x n
    [U, S, V] = svd(Sigma);

    it is important to use sigma with a Capital "S" as sigma will coincide with the use of svd, and the code may fail to run. Nice work!!!

    ReplyDelete
Post a Comment
Previous Post Next Post