Perplex
Dashboard
Topics
Exponents & LogarithmsApproximations & ErrorSequences & SeriesMatricesComplex NumbersFinancial Mathematics
Cartesian plane & linesFunction TheoryModellingTransformations & asymptotes
2D & 3D GeometryVoronoi DiagramsTrig equations & identitiesVectorsGraph Theory
ProbabilityDescriptive StatisticsBivariate StatisticsDistributions & Random VariablesInference & Hypotheses
DifferentiationIntegrationDifferential Equations
Review VideosFormula BookletMy Progress
BlogLanding Page
Sign UpLogin
Perplex
Perplex
Dashboard
Topics
Exponents & LogarithmsApproximations & ErrorSequences & SeriesMatricesComplex NumbersFinancial Mathematics
Cartesian plane & linesFunction TheoryModellingTransformations & asymptotes
2D & 3D GeometryVoronoi DiagramsTrig equations & identitiesVectorsGraph Theory
ProbabilityDescriptive StatisticsBivariate StatisticsDistributions & Random VariablesInference & Hypotheses
DifferentiationIntegrationDifferential Equations
Review VideosFormula BookletMy Progress
BlogLanding Page
Sign UpLogin
Perplex
IB Math AIHL
/
Bivariate Statistics
/
Skills
Edit

Skill Checklist

Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

IB Math AIHL
/
Bivariate Statistics
/
Skills
Edit

Skill Checklist

Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Skill Checklist

Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.

11 Skills Available

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Linear Regression

6 skills
Plotting approximate best fit line
SL 4.4

Best fit lines can also be drawn approximately by eye. We start by finding the average ​x​ and ​y, giving the point ​(xˉ,yˉ​). We then take a ruler and place it on this point, and adjust the slope until we find a reasonable best fit line.


Regression line y on x
SL 4.4

Linear regression is a statistical method used to model the relationship between two variables when data is given as pairs of points ​(x,y). We fit a straight line (called the regression line) that minimizes the average vertical distance from the points:


The general equation of the regression line is:

​
y=ax+b
​

where ​a​ is the slope and ​b​ is the ​y​-intercept.


The values of ​a​ and ​b​ can be found using a calculator:

  • Use Stat>Edit to fill in ​x​- and ​y​-values into ​L1​​ and ​L2​.

  • Then, press Stat, right arrow to the CALC menu, and select 4:LinReg(ax+b).

Pearson's Product-Moment Correlation Coefficient
SL 4.4

Pearson's product-moment correlation coefficient, denoted by ​r, measures the strength and direction of a linear relationship between two numerical variables ​x​ and ​y. Its value always lies between ​−1​ and ​+1:

  • ​r=+1: perfect positive linear relationship

  • ​r=−1: perfect negative linear relationship

  • ​r=0: no linear relationship

A positive value means ​y​ generally increases as ​x​ increases; a negative value means ​y​ generally decreases as ​x​ increases. The closer ​r​ is to ​±1, the stronger the linear relationship.


If you clickmode, scroll to STAT DIAGNOSTICS , hover over ON, and click ENTER, then any time you perform a linear regression, the calculator will provide Pearson's coefficient in addition to the regression line.

Predicting y from x
SL 4.4

Once we have a regression line ​y=ax+b, we can use it to predict ​y​ by plugging in a value of ​x.

Danger of extrapolation
SL 4.4

When using a regression line to predict ​y​ from ​x, we need to be aware of the danger of extrapolation. This occurs when we try to predict ​y​ for a value of ​x​ far outside the range of ​x​ values in our data. For such an ​x, we cannot trust that the relationship is the same.

Limitations of predicting x from y
SL 4.4

While it is possible to use a regression line ​y=ax+b​ to predict ​x​ with

​
x=ay−b​,
​

this is not a reliable process. The best fit line is determined to minimize the difference between the real ​y’s​ and the predicted ​y’s,​so the difference between real and predicted values for ​x​ may be much larger.

Spearman's Rank Correlation Coefficient

1 skill
Spearman’s rank correlation coefficient
SL AI 4.10

Spearman's rank correlation coefficient tells you how well two variables line up in terms of order instead of actual values. It answers the question "When ​x​ is larger, does ​y​ also tend to be larger?". It compares data relatively, and measures whether the data is consistently sloping up.


Its value is between ​−1​ and ​1, with negative values for data that generally slopes down, and positive values when data generally slopes up.

Weak positive Spearman correlation - the data zigzags and the slope of each segment changes.

Perfect negative spearman correlation (​−1​) - every segment is sloping down.


Example calculation

To calculate it, we first convert our data values into ranks, which just means the ​1st​ smallest, ​2nd​ smallest etc. Then, we calculate the regular Pearson ​r​ for the correlation between these ranks:


​x​

​y​

​100​

​5000​

​30​

​400​

​20​

​20​

​50​

​400​


There are ​4​ values for ​x. In order, they are:

  1. ​20​ 

  2. ​30​

  3. ​50​

  4. ​100​

The order for ​y​ is

  1. ​20​

  2. ​400​

  3. ​400​

  4. ​1000​

Since ​400​ appears at both positions ​2​ and ​3, we say that each of them are tied for rank ​2.5.


Now we update the table with the ranks:

​x​

Rank ​x​

​y​

Rank ​y​

​100​

​4​

​5000​

​4​

​30​

​2​

​400​

​2.5​

​20​

​1​

​20​

​1​

​50​

​3​

​400​

​2.5​

Now we enter the ranks into our calculators, and use linear regression to find ​r≈0.949. This is the Spearman rank correlation coefficient for this data.

Non-linear regression & residuals

4 skills
Sum of square Residuals
AHL AI 4.13

The sum of square residuals, denoted ​SSres​, is a measure of fit for a model. It works like this:

  • Take the difference between the actual values ​yi​​ and the values predicted by the model, which we call ​y^​i​. 

  • Square each of those differences

  • Add them all up to find ​SSres​​

In short, ​SSres​​ is how far off the model is at each point, squared, and added for all the points. Visually, ​SSres​​ is the sum of the areas of these squares:

In IB exams, you need to know how to find it from a table:

​yi​​

​1​

​2​

​3​

​y^​i​​

​0.9​

​1.5​

​3.2​

​ri2​​

​(1−0.9)2=0.01​

​0.25​

​0.04​

Adding these all up gives

​
SSres​=0.01+0.25+0.04=0.3
​

You can use your calculator to do this more quickly using list and summation features.

The coefficient of determination R²
AHL AI 4.13

The coefficient of determination, denoted ​R2, is the most commonly used measure for how well a model fits the data.


In plain terms, ​R2​ answers the question "What fraction of the spread in the data is explained by the model?".


It takes values between ​0​ and ​1, which you can think of as between

  • ​0%​ of the variation is explained by the model, which means the model is completely useless.

  • ​100%​ of the variation is explained by the model, which means the model perfectly predicts the values observed.

Note that an ​R2​ value of ​1​ does not mean that the model will perfectly predict other values.

Quadratic, cubic, exponential and power regression
AHL AI 4.13

The IB expects you to know how to use given data and your calculator to fit models of the following types:

Model

Equation

Calculator Name (TI-84)

linear

​ax+b​

​LinReg​

quadratic

​ax2+bx+c​

​QuadReg​

cubic

​ax3+bx2+cx+d​

​CubicReg​

exponential

​a×bx​

​ExpReg​

power

​a×xb​

​PwrReg​

You will also need to use ​sin​ regression, but it works a little differently so we'll explain it right after this.


All of the models in the table work the same on your calculator:

  1. Enter the ​x​ list (usually into ​L1​​)

  2. Enter the ​y​ list (usually into ​L2​​)

  3. Navigate to ​stat>calc​ and scroll down to find the right model.

  4. Select the ​XList​ and ​YList​ you entered.

  5. Scroll down to calculate, hit enter, and the calculator returns the parameters (​a,b​ etc) and the value of ​R2.

Sinusoidal regression with technology
AHL AI 4.13

Your calculator should have a function called sinusoidal regression which you can use when you know at least ​4​ points on a sinusoidal function, and you can estimate the period. To use it, first enter the ​x​ coordinates (or independent variables) into ​L1​​ and the ​y​ coordinates (or dependent variable) into ​L2​.

The calculator will likely ask you to provide a number for "iterations", which is simply the number of "loops" it makes in refining its approximation. ​5​ will be plenty unless a problem asks for a very high degree of accuracy.

Skill Checklist

Track your progress across all skills in your objective. Mark your confidence level and identify areas to focus on.

11 Skills Available

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Track your progress:

Don't know

Working on it

Confident

📖 = included in formula booklet • 🚫 = not in formula booklet

Linear Regression

6 skills
Plotting approximate best fit line
SL 4.4

Best fit lines can also be drawn approximately by eye. We start by finding the average ​x​ and ​y, giving the point ​(xˉ,yˉ​). We then take a ruler and place it on this point, and adjust the slope until we find a reasonable best fit line.


Regression line y on x
SL 4.4

Linear regression is a statistical method used to model the relationship between two variables when data is given as pairs of points ​(x,y). We fit a straight line (called the regression line) that minimizes the average vertical distance from the points:


The general equation of the regression line is:

​
y=ax+b
​

where ​a​ is the slope and ​b​ is the ​y​-intercept.


The values of ​a​ and ​b​ can be found using a calculator:

  • Use Stat>Edit to fill in ​x​- and ​y​-values into ​L1​​ and ​L2​.

  • Then, press Stat, right arrow to the CALC menu, and select 4:LinReg(ax+b).

Pearson's Product-Moment Correlation Coefficient
SL 4.4

Pearson's product-moment correlation coefficient, denoted by ​r, measures the strength and direction of a linear relationship between two numerical variables ​x​ and ​y. Its value always lies between ​−1​ and ​+1:

  • ​r=+1: perfect positive linear relationship

  • ​r=−1: perfect negative linear relationship

  • ​r=0: no linear relationship

A positive value means ​y​ generally increases as ​x​ increases; a negative value means ​y​ generally decreases as ​x​ increases. The closer ​r​ is to ​±1, the stronger the linear relationship.


If you clickmode, scroll to STAT DIAGNOSTICS , hover over ON, and click ENTER, then any time you perform a linear regression, the calculator will provide Pearson's coefficient in addition to the regression line.

Predicting y from x
SL 4.4

Once we have a regression line ​y=ax+b, we can use it to predict ​y​ by plugging in a value of ​x.

Danger of extrapolation
SL 4.4

When using a regression line to predict ​y​ from ​x, we need to be aware of the danger of extrapolation. This occurs when we try to predict ​y​ for a value of ​x​ far outside the range of ​x​ values in our data. For such an ​x, we cannot trust that the relationship is the same.

Limitations of predicting x from y
SL 4.4

While it is possible to use a regression line ​y=ax+b​ to predict ​x​ with

​
x=ay−b​,
​

this is not a reliable process. The best fit line is determined to minimize the difference between the real ​y’s​ and the predicted ​y’s,​so the difference between real and predicted values for ​x​ may be much larger.

Spearman's Rank Correlation Coefficient

1 skill
Spearman’s rank correlation coefficient
SL AI 4.10

Spearman's rank correlation coefficient tells you how well two variables line up in terms of order instead of actual values. It answers the question "When ​x​ is larger, does ​y​ also tend to be larger?". It compares data relatively, and measures whether the data is consistently sloping up.


Its value is between ​−1​ and ​1, with negative values for data that generally slopes down, and positive values when data generally slopes up.

Weak positive Spearman correlation - the data zigzags and the slope of each segment changes.

Perfect negative spearman correlation (​−1​) - every segment is sloping down.


Example calculation

To calculate it, we first convert our data values into ranks, which just means the ​1st​ smallest, ​2nd​ smallest etc. Then, we calculate the regular Pearson ​r​ for the correlation between these ranks:


​x​

​y​

​100​

​5000​

​30​

​400​

​20​

​20​

​50​

​400​


There are ​4​ values for ​x. In order, they are:

  1. ​20​ 

  2. ​30​

  3. ​50​

  4. ​100​

The order for ​y​ is

  1. ​20​

  2. ​400​

  3. ​400​

  4. ​1000​

Since ​400​ appears at both positions ​2​ and ​3, we say that each of them are tied for rank ​2.5.


Now we update the table with the ranks:

​x​

Rank ​x​

​y​

Rank ​y​

​100​

​4​

​5000​

​4​

​30​

​2​

​400​

​2.5​

​20​

​1​

​20​

​1​

​50​

​3​

​400​

​2.5​

Now we enter the ranks into our calculators, and use linear regression to find ​r≈0.949. This is the Spearman rank correlation coefficient for this data.

Non-linear regression & residuals

4 skills
Sum of square Residuals
AHL AI 4.13

The sum of square residuals, denoted ​SSres​, is a measure of fit for a model. It works like this:

  • Take the difference between the actual values ​yi​​ and the values predicted by the model, which we call ​y^​i​. 

  • Square each of those differences

  • Add them all up to find ​SSres​​

In short, ​SSres​​ is how far off the model is at each point, squared, and added for all the points. Visually, ​SSres​​ is the sum of the areas of these squares:

In IB exams, you need to know how to find it from a table:

​yi​​

​1​

​2​

​3​

​y^​i​​

​0.9​

​1.5​

​3.2​

​ri2​​

​(1−0.9)2=0.01​

​0.25​

​0.04​

Adding these all up gives

​
SSres​=0.01+0.25+0.04=0.3
​

You can use your calculator to do this more quickly using list and summation features.

The coefficient of determination R²
AHL AI 4.13

The coefficient of determination, denoted ​R2, is the most commonly used measure for how well a model fits the data.


In plain terms, ​R2​ answers the question "What fraction of the spread in the data is explained by the model?".


It takes values between ​0​ and ​1, which you can think of as between

  • ​0%​ of the variation is explained by the model, which means the model is completely useless.

  • ​100%​ of the variation is explained by the model, which means the model perfectly predicts the values observed.

Note that an ​R2​ value of ​1​ does not mean that the model will perfectly predict other values.

Quadratic, cubic, exponential and power regression
AHL AI 4.13

The IB expects you to know how to use given data and your calculator to fit models of the following types:

Model

Equation

Calculator Name (TI-84)

linear

​ax+b​

​LinReg​

quadratic

​ax2+bx+c​

​QuadReg​

cubic

​ax3+bx2+cx+d​

​CubicReg​

exponential

​a×bx​

​ExpReg​

power

​a×xb​

​PwrReg​

You will also need to use ​sin​ regression, but it works a little differently so we'll explain it right after this.


All of the models in the table work the same on your calculator:

  1. Enter the ​x​ list (usually into ​L1​​)

  2. Enter the ​y​ list (usually into ​L2​​)

  3. Navigate to ​stat>calc​ and scroll down to find the right model.

  4. Select the ​XList​ and ​YList​ you entered.

  5. Scroll down to calculate, hit enter, and the calculator returns the parameters (​a,b​ etc) and the value of ​R2.

Sinusoidal regression with technology
AHL AI 4.13

Your calculator should have a function called sinusoidal regression which you can use when you know at least ​4​ points on a sinusoidal function, and you can estimate the period. To use it, first enter the ​x​ coordinates (or independent variables) into ​L1​​ and the ​y​ coordinates (or dependent variable) into ​L2​.

The calculator will likely ask you to provide a number for "iterations", which is simply the number of "loops" it makes in refining its approximation. ​5​ will be plenty unless a problem asks for a very high degree of accuracy.