tag:blogger.com,1999:blog-4925159124120434624.post7097239314494496224..comments2020-01-27T12:10:21.459+05:30Comments on APDaga DumpBox : The Thirst for Learning...: Coursera: Machine Learning (Week 1) Quiz - Linear Regression with One Variable | Andrew NGAkshay Daga (APDaga)http://www.blogger.com/profile/04899846959607912677noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-4925159124120434624.post-3924971892782557142019-11-13T11:59:06.021+05:302019-11-13T11:59:06.021+05:30This statement is wrong.
If we initialize \theta_0...This statement is wrong.<br />If we initialize \theta_0 & \theta_1 to same value. After a simultaneous update it is not necessary that \theta_0 & \theta_1 will be updated to same value. (That updated value of \theta_0 & \theta_1 will depend on the slope of the curve along x_1 & x_2 respectively)<br /><br />On other hand, for Neural Networks, Above statement hold true. There we have to break the symmetry. That's why we initialize all the weights randomly. But no need to do it in Gradient descent.<br /><br />ThanksAkshay Daga (APDaga)https://www.blogger.com/profile/04899846959607912677noreply@blogger.comtag:blogger.com,1999:blog-4925159124120434624.post-81301747388278356882019-11-09T16:34:15.865+05:302019-11-09T16:34:15.865+05:30hi Im asking you this coz this was not on your ans...hi Im asking you this coz this was not on your answer list. can you please give the correct answer for the following?<br />Let fff be some function so that<br /><br />f(θ0,θ1)f(\theta_0, \theta_1)f(θ0,θ1) outputs a number. For this problem,<br /><br />fff is some arbitrary/unknown smooth function (not necessarily the<br /><br />cost function of linear regression, so fff may have local optima).<br /><br />Suppose we use gradient descent to try to minimize f(θ0,θ1)f(\theta_0, \theta_1)f(θ0,θ1)<br /><br />as a function of θ0\theta_0θ0 and θ1\theta_1θ1. Which of the<br /><br />following statements are true? (Check all that apply.)<br /><br />If theta_zero and theta_one are initialized so that theta_zero = theta_one, then by symmetry (because we do simultaneous updates to two parameters), after one iteration of the gradient descent, we will still have theta_zero = theta_one<br /><br />(please give the answer whether above statement is correct (i think its wrong) but with an explanation if possible! thanks!<br />Nival Kolambagehttps://www.blogger.com/profile/03620304002344866971noreply@blogger.com