Coursera: Machine Learning (Week 1) Quiz - Linear Regression with One Variable | Andrew NG

byAkshay Daga (APDaga) -September 28, 2019

21

▸ Linear Regression with One Variable :

Recommended Machine Learning Courses:

Coursera: Machine Learning

Coursera: Deep Learning Specialization

Coursera: Machine Learning with Python

Coursera: Advanced Machine Learning Specialization

Udemy: Machine Learning

LinkedIn: Machine Learning

Eduonix: Machine Learning

edX: Machine Learning

Fast.ai: Introduction to Machine Learning for Coders

Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year. Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is $h\theta(x) = \theta_0 + \theta_1x$ to denote the number of training examples.

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of $m$ ? In the box below, please enter your answer (which should be a number between 0 and 10).
```
4 
```

Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.

You would like to use linear regression ( $h_\theta(x) = \theta_0 + \theta_1x$ ) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for $\theta_0$ and $\theta_1$ ? You should be able to select the right answer without actually implementing linear regression.
- $\theta_0$ = −569.6, $\theta_1$ = 530.9
- $\theta_0$ = −1780.0, $\theta_1$ = −530.9
- $\theta_0$ = −569.6, $\theta_1$ = −530.9
- $\theta_0$ = −1780.0, $\theta_1$ = 530.9

For this question, assume that we are using the training set from Q1.
Recall our definition of the cost function was $J(\theta_0, \theta_1 ) = \frac{1}{2m} \sum_{i=1}^{m} (h (x^{(i)} ) - y^{(i)})^2$
What is $J(0,1)$ ? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and ‘.’ as the decimal delimiter e.g., 1.5).
```
0.5
```

Suppose we set $\theta_0 = 0, \theta_1 = 1.5$ in the linear regression hypothesis from Q1. What is $h_\theta(2)$ ?
```
3
```

Suppose we set $\theta_0$ = −2, $\theta_1$ = 0.5 in the linear regression hypothesis from Q1. What is $h_\theta(6)$ ?
```
1
```

Let $f$ be some function so that $f(\theta_0 , \theta_1 )$ outputs a number. For this problem, $f$ is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so $f$ may have local optima).
Suppose we use gradient descent to try to minimize $f(\theta_0 , \theta_1 )$ as a function of $\theta_0$ and $\theta_1$ .
Which of the following statements are true? (Check all that apply.)

If $\theta_0$ and $\theta_1$ are initialized at the global minimum, then one iteration will not change their values.
Setting the learning rate $\alpha$ to be very small is not harmful, and can only speed up the convergence of gradient descent.
No matter how $\theta_0$ and $\theta_1$ are initialized, so long as $\alpha$ is sufficiently small, we can safely expect gradient descent to converge to the same solution.
If the first few iterations of gradient descent cause $f(\theta_0 , \theta_1)$ to increase rather than decrease, then the most likely cause is that we have set the learning rate $\alpha$ to too large a value.

In the given figure, the cost function $J(\theta_0, \theta_1)$ has been plotted against $\theta_0$ and $\theta_1$ , as shown in ‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure, choose the correct options (check all that apply).
- If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0, \theta_1)$ is maximum at point A.
- If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point C, as the value of cost function $J(\theta_0, \theta_1)$ is minimum at point C.
- Point P (the global minimum of plot 2) corresponds to point A of Plot 1.
- If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0, \theta_1)$ is minimum at A.
- Point P (The global minimum of plot 2) corresponds to point C of Plot 1.

Check-out our free tutorials on IOT (Internet of Things):

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some $\theta_0, \theta_1$ , such that $J(\theta_0 , \theta_1) = 0$ .
Which of the statements below must then be true? (Check all that apply.)
- Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
- For this to be true, we must have $\theta_0 = 0$ and $\theta_1 = 0$
  so that $h_{\theta}(x) = 0$
- For this to be true, we must have $y^{(i)} = 0$ for every value of $i$ = 1, 2,…, $m$ .
- Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.

Click here to see solutions for all Machine Learning Coursera Assignments.
&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to answer it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
- APDaga DumpBox

21 Comments

Nival Kolambage9 November 2019 at 16:34
hi Im asking you this coz this was not on your answer list. can you please give the correct answer for the following?
Let fff be some function so that

f(θ0,θ1)f(\theta_0, \theta_1)f(θ0,θ1) outputs a number. For this problem,

fff is some arbitrary/unknown smooth function (not necessarily the

cost function of linear regression, so fff may have local optima).

Suppose we use gradient descent to try to minimize f(θ0,θ1)f(\theta_0, \theta_1)f(θ0,θ1)

as a function of θ0\theta_0θ0 and θ1\theta_1θ1. Which of the

following statements are true? (Check all that apply.)

If theta_zero and theta_one are initialized so that theta_zero = theta_one, then by symmetry (because we do simultaneous updates to two parameters), after one iteration of the gradient descent, we will still have theta_zero = theta_one

(please give the answer whether above statement is correct (i think its wrong) but with an explanation if possible! thanks!
ReplyDelete
Replies
Unknown16 March 2020 at 22:03
How you solved second problem plz tell me
ReplyDelete
Replies
Densil27 March 2020 at 18:06
Thanks for the solutions but can you help with some details on how you came up with these answers.
Videos are coursera are not that straight forward or some links that we can go through. Appreciate your help.
ReplyDelete
Replies
Unknown21 May 2020 at 10:08
please help me ,i don't find the answer, only 2 question i find 😔
ReplyDelete
Replies
Jasmeet9 June 2020 at 19:04
Please tell me how to solve question 2 (Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms...) ?
ReplyDelete
Replies
Naji20 August 2020 at 01:46
Suppose we set \theta_0 = −1, \theta_1 = 0.5 in the linear regression hypothesis from Q1. What is h_\theta(4)?
ReplyDelete
Replies
Unknown1 October 2020 at 23:02
thanks for sharing brother, but can you upload a the paper piece on which u have solved these questions
ReplyDelete
Replies
Unknown8 November 2020 at 14:15
Hi,
How did you get the answer for question 2? Isn't m=4?
Thanks!
ReplyDelete
Replies
Unknown15 March 2021 at 17:55
Thank You for existing.
ReplyDelete
Replies
Unknown31 July 2021 at 13:40
Consider the following training set of m=4m=4 training examples:

x y
1 0.5
2 1
4 2
0 0
Consider the linear regression model h_\theta(x) = \theta_0 + \theta_1xh
θ

(x)=θ
0

+θ
1

x. What are the values of \theta_0θ
0

and \theta_1θ
1

that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)

1 point

\theta_0 = 0 , \theta_1 = 0.5θ
0

=0,θ
1

=0.5

\theta_0 = 0.5, \theta_1 = 0θ
0

=0.5,θ
1

=0

\theta_0 = 1, \theta_1 = 0.5θ
0

=1,θ
1

=0.5

\theta_0 = 1, \theta_1 = 1θ
0

=1,θ
1

=1

\theta_0 = 0.5, \theta_1 = 0.5θ
0

=0.5,θ
1

=0.5
ReplyDelete
Replies

Add comment

Coursera: Machine Learning (Week 1) Quiz - Linear Regression with One Variable | Andrew NG

▸ Linear Regression with One Variable :

Check-out our free tutorials on IOT (Internet of Things):

21 Comments

Contact form