Linear Regression

Plotting approximate best fit line

SL 4.4

Best fit lines can also be drawn approximately by eye. We start by finding the average x and y, giving the point (xˉ,yˉ). We then take a ruler and place it on this point, and adjust the slope until we find a reasonable best fit line.

Regression line y on x

SL 4.4

Linear regression is a statistical method used to model the relationship between two variables when data is given as pairs of points (x,y). We fit a straight line (called the regression line) that minimizes the average vertical distance from the points:

The general equation of the regression line is:

y=ax+b

where a is the slope and b is the y-intercept.

The values of a and b can be found using a calculator:

Use Stat>Edit to fill in x- and y-values into L1 and L2.
Then, press Stat, right arrow to the CALC menu, and select 4:LinReg(ax+b).

Pearson's Product-Moment Correlation Coefficient

SL 4.4

Pearson's product-moment correlation coefficient, denoted by r, measures the strength and direction of a linear relationship between two numerical variables x and y. Its value always lies between −1 and +1:

r=+1: perfect positive linear relationship
r=−1: perfect negative linear relationship
r=0: no linear relationship

A positive value means y generally increases as x increases; a negative value means y generally decreases as x increases. The closer r is to ±1, the stronger the linear relationship.

If you clickmode, scroll to STAT DIAGNOSTICS , hover over ON, and click ENTER, then any time you perform a linear regression, the calculator will provide Pearson's coefficient in addition to the regression line.

Predicting y from x

SL 4.4

Once we have a regression line y=ax+b, we can use it to predict y by plugging in a value of x.

Danger of extrapolation

SL 4.4

When using a regression line to predict y from x, we need to be aware of the danger of extrapolation. This occurs when we try to predict y for a value of x far outside the range of x values in our data. For such an x, we cannot trust that the relationship is the same.

Limitations of predicting x from y

SL 4.4

While it is possible to use a regression line y=ax+b to predict x with

x=ay−b,

this is not a reliable process. The best fit line is determined to minimize the difference between the real y’s and the predicted y’s,so the difference between real and predicted values for x may be much larger.

Regression line x on y

SL 4.10

In the same way that we can plot a straight line minimizing the vertical distances from points (x,y), we can plot a straight line minimizing the horizontal distances. This is called an x on y regression line. We calculate an x on y regression line by switching our x and y lists while using LinReg(ax+b).

With this line we can make reliable predictions of x given y, so long as we are not extrpolating.

Plotting approximate best fit line

SL 4.4

Regression line y on x

SL 4.4

The general equation of the regression line is:

y=ax+b

where a is the slope and b is the y-intercept.

The values of a and b can be found using a calculator:

Use Stat>Edit to fill in x- and y-values into L1 and L2.
Then, press Stat, right arrow to the CALC menu, and select 4:LinReg(ax+b).

Pearson's Product-Moment Correlation Coefficient

SL 4.4

r=+1: perfect positive linear relationship
r=−1: perfect negative linear relationship
r=0: no linear relationship

A positive value means y generally increases as x increases; a negative value means y generally decreases as x increases. The closer r is to ±1, the stronger the linear relationship.

Predicting y from x

SL 4.4

Once we have a regression line y=ax+b, we can use it to predict y by plugging in a value of x.

Danger of extrapolation

SL 4.4

Limitations of predicting x from y

SL 4.4

While it is possible to use a regression line y=ax+b to predict x with

x=ay−b,

Regression line x on y

SL 4.10

With this line we can make reliable predictions of x given y, so long as we are not extrpolating.

Exercises

Key Skills

Exercises

Key Skills

Linear Regression

Exercises

Key Skills

Plotting approximate best fit line

Regression line y on x

Pearson's Product-Moment Correlation Coefficient

Predicting y from x

Danger of extrapolation

Limitations of predicting x from y

Regression line x on y

Plotting approximate best fit line

Regression line y on x

Pearson's Product-Moment Correlation Coefficient

Predicting y from x

Danger of extrapolation

Limitations of predicting x from y

Regression line x on y