Linear Regression Worksheet

Linear Regression Worksheet
• Page 1
1.
There are two things that should be done before doing regression analysis. They are:
I. Collect the data and then construct a scatter plot to determine the nature of the relationship.
II. Collect the data and construct a histogram.
III. Compute the value of the correlation coefficient to test the significance of the relationship.
IV. Test the significance of the relationship.
 a. I and IV only b. II and IV only c. II and III only d. I and III only

Solution:

Collect the data and then construct a scatter plot to determine the nature of the relationship.

Compute the value of the correlation coefficient to test the significance of the relationship.

2.
General form of the regression line used in statistics is ____ and the $y$′-intercept and the slope respectively are
 a. $y$′ = $\mathrm{a + bx}$, $a$, $b$ b. $y$′ = $\mathrm{ax}$2 + $b$, $b$, $a$ c. $y$′ = $\mathrm{a + bx}$2, $a$, $b$ d. $y$′ = $\mathrm{ax + b}$, $b$, $a$

Solution:

General form of the regression line used in statistics is y′ = a + bx.

The y′ - intercept is a.

The slope is b.

3.
The relation between the sign of the correlation coefficient($r$) and the sign of the slope($b$) of the regression line is
 a. when $r$ is positive, $b$ is also positive and when $r$ is negative, $b$ is also negative b. when $r$ is positive, $b$ is negative and when $r$ is negative, $b$ is positive c. when $r$ is positive or negative, $b$ is positive d. when $r$ is positive or negative, $b$ is negative

Solution:

The correlation between the sign of the correlation coefficient and the sign of the slope of the regression line is such that, when r is positive, b is also positive, when r is negative, b is also negative.

Therefore, if r is positive, b is positive and if r is negative, b is negative.

4.
Regression should be done only when
 a. $r$ is significant b. none of these c. regression does not depend on $r$ so can be done disregarding $r$ d. $r$ is not significant

Solution:

Regression should only be done when r is significant, i.e., when the value of r is closer to ± 1.

It is meaningless to determine the regression line when r is not significant (close to ± 0) and then make predictions using that line.

5.
The formulas for $a$ and $b$ of the regression line $y$′ = $a$ + $\mathrm{bx}$ are

 a. II only b. III only c. IV only d. I only

Solution:

The formulas for a and b of the regression line y′ = a + bx are
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2
where a is the y′ - intercept and b is the slope.

6.
Determine the equation of the regression line for the data and predict the value for $y$′ when $x$ = 3.0.
 $x$ 2.1 1.7 1.1 1.5 2.7 $y$ 40 37 35 36 42

 a. $y$′ = 29.438 - 4.704$x$, 15.326 b. $y$′ = 4.704 + 29.438$x$, 93.018 c. $y$′ = 4.704 + 29.438$x$, 43.55 d. $y$′ = 29.438 + 4.704$x$, 43.55

Solution:

Make a table with values for x, y, x2, y2, xy

To find the regression line, we first have to find out r, the correlation coefficient.

If r is not significant, then we cannot predict the value of y′ and we must not find the regression line equation.

r = n(Σxy)-(Σx)(Σy)[n(Σx2)-(Σx)2][n(Σy2)-(Σy)2]

r = + 0.98
[Substitute the values and simplify.]

Therefore, r is significant and the regression line can be found for any predictions.

Regression line equation is y′ = a + bx
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2

a = 29.438, b = 4.704
[Substitute the values and simplify.]

Hence, the equation of the regression line y′ = a + bx is
y′ = 29.438 + 4.704x

When x = 3.0, y′ = 43.55
[Substitute the values and simplify.]

7.
A business man wants to know the likely cost of a new contract based on the data collected from the contracts of the previous years. Determine the equation of the regression line for the data and predict the value for $y$′ when $x$ = 30.
 Years 2000 2001 2002 2003 2004 No.of employees ($x$) (in 1000) 10 20 16 14 24 Total cost of contract ($y$) (in lakh \$) 6 15 12 9 20

 a. $y$′ = - 4.343 + 0.997$x$, 25.567 b. $y$′ = 0.997 - 4.343$x$, - 129.293 c. $y$′ = 4.343 + 0.997$x$, 34.253 d. $y$′ = 4.343 - 0.997$x$, - 25.567

Solution:

Make a table with values for x, y, x2, y2, xy.

r = n(Σxy)-(Σx)(Σy)[n(Σx2)-(Σx)2][n(Σy2)-(Σy)2]

r = + 0.995
[Substitute the values and simplify.]

Therefore, r is significant and the regression line can be found for any predictions.

Regression line equation is y′ = a + bx
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2

a = - 4.343, b = 0.997
[Substitute the values and simplify.]

Therefore, the equation of the regression line y′ = a + bx is
y′ = - 4.343 + 0.997x

When x = 30, y′ = 25.567
[Substitute the values and simplify.]

8.
Determine the equation of the regression line and plot the line on the scatter plot for the data.
 $x$ 2 8 7 5 3 6 4 1 $y$ 6 8 4 1 5 7 2 3

 a. $y$′ = 2.893 + 0.357$x$ ; Graph 1 b. $y$′ = 0.357 + 2.893$x$ ; Graph 3 c. $y$′ = 2.893 - 0.357$x$ ; Graph 2 d. $y$′ = - 0.357 + 2.893$x$ ; Graph 4

Solution:

Make a table with values for x, y, x2, xy.

Regression line equation is y′ = a + bx
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2

a = 2.893, b = 0.357
[Substitute the values and simplify.]

Therefore, the equation of the regression line y′ = a + bx is y′ = 2.893 + 0.357x.

Draw a scatter plot for the data and plot the line.

9.
An experiment was conducted to find out the number of times a tennis ball bounces after it is thrown from a fixed height. This was tried for different heights and the number of times the ball bounced was measured. Determine the equation of the regression line for the data and predict the value for $y$′ when $x$ = 100.
 Height (in feet) ($x$) 10 20 30 45 60 No.of bounces ($y$) 6 10 13 19 25

 a. $y$′ = - 2.152 + 0.377$x$, 35.548 b. $y$′ = - 0.377 + 2.152$x$, 215 c. $y$′ = 0.377 - 2.152$x$, 215 d. $y$′ = 0.377 + 2.152$x$, 216

Solution:

Make a table with values for x, y, x2, y2, xy

r = n(Σxy)-(Σx)(Σy)[n(Σx2)-(Σx)2][n(Σy2)-(Σy)2]

r = + 0.999
[Substitute the values and simplify.]

Therefore, r is significant and the regression line can be found for any predictions.

Regression line equation is y′ = a + bx
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2

a = - 2.152, b = 0.377

Therefore, the equation of the regression line y′ = a + bx is y′ = - 2.152 + 0.377x

When x = 100, y′ = 35.548

10.
The university authorities wanted to predict a student's grade on a statistics midterm score based on his/her SAT scores. Determine the correlation coefficient and the equation of the regression line for the data and predict the value of $y$′ when $x$ = 900. Test the hypothesis that there is a significant relationship between the SAT scores and the Statistcs scores at α = 0.05.
 SAT scores ($x$) 1100 1300 1000 1200 1100 1200 1400 1000 Statistics Midterm scores($y$) 89 92 86 92 90 93 98 88
 a. + 0.941, $y$′ = 62.514 + 0.025$x$, yes b. + 0.941, $y$′ = 62.514 + 0.025$x$, no c. + 0.491, $y$′ = 62.514 + 0.025$x$, yes d. + 0.491, $y$′ = 62.514 + 0.025$x$, no

Solution:

Make a table with values for x, y, x2, y2, xy

r = n(Σxy)-(Σx)(Σy)[n(Σx2)-(Σx)2][n(Σy2)-(Σy)2]

r = + 0.941
[Substitute the values and simplify.]

Therefore, r is significant and the regression line can be found for any predictions.

Regression line equation is y′ = a + bx
a = (Σy)(Σx2)-(Σx)(Σxy)n(Σx2)-(Σx)2
b = n(Σxy)-(Σx)(Σy)n(Σx2)-(Σx)2

a = 62.514, b = 0.025
[Substitute the values and simplify.]

Therefore, the equation of the regression line y′ = a + bx is y′ = 62.514 + 0.025x

When x = 900, y 85

State the hypothesis:
H0: ρ = 0 and H1: ρ≠ 0

Since α = 0.05 and there are 8 - 2 = 6 degrees of freedom, the critical values obtained from table are ± 2.447.

Compute the test value, t = r n-21-r2 = 6.797
[Substitute the values in the formula and simplify.]

Reject the null hypothesis, since the test value falls in the critical region as shown.

Therefore, there is a significant relationship between the SAT scores and the Statistics Midterm scores.