Perplex
  • Dashboard
Topics
Exponents & LogarithmsRounding & ErrorSequences & SeriesFinancial MathematicsMatricesComplex Numbers
Cartesian plane & linesFunction TheoryModellingTransformations & asymptotes
2D & 3D GeometryVoronoi DiagramsTrig equations & identitiesVectorsGraph Theory
ProbabilityDescriptive StatisticsBivariate StatisticsDistributions & Random VariablesInference & Hypotheses
DifferentiationIntegrationDifferential Equations
Paper 3
Plus
Calculator Skills
Review VideosFormula BookletAll Study Sets
BlogLanding Page
Sign UpLogin
Perplex
Perplex
  • Dashboard
Topics
Exponents & LogarithmsRounding & ErrorSequences & SeriesFinancial MathematicsMatricesComplex Numbers
Cartesian plane & linesFunction TheoryModellingTransformations & asymptotes
2D & 3D GeometryVoronoi DiagramsTrig equations & identitiesVectorsGraph Theory
ProbabilityDescriptive StatisticsBivariate StatisticsDistributions & Random VariablesInference & Hypotheses
DifferentiationIntegrationDifferential Equations
Paper 3
Plus
Calculator Skills
Review VideosFormula BookletAll Study Sets
BlogLanding Page
Sign UpLogin
Perplex
/
Descriptive Statistics
/
Data Collection
Mixed Practice
Data Collection
Descriptive Statistics

Data Collection

0 of 0 exercises completed

Designing surveys with clear, unbiased and appropriately structured questions, and considering the reliability and validity of the data collected.

Want a deeper conceptual understanding? Try our interactive lesson!

Designing Surveys
AHL AI 4.12

If you've taken a survey before, you know that they can be very poorly designed. We've compiled below some examples of terrible survey questions:

Question
Why it's terrible

How much do you love Perplex?

Biased question, people don't like to be too critical

How often do your parents argue about money at home?

Too personal

Tell me about your mental health.

Unstructured! I don't know what you're asking

How many hours do you study each week?

  • Never

  • 3

  • Quite a lot

  • Yes

Inconsistent answer choices


Reliability of a measurement
AHL AI 4.12

When we use statistics to come to a conclusion, the reliability of the conclusion describes how consistent the variables we measured are.


To determine if a measurement is reliable, we ask ourselves whether we would get similar results redoing the same experiment, with the same sample.


An example of an unreliable test is a lab analysis of cholesterol level. The same person could have their blood tested just days apart and get significantly different results. The measurement can be made more reliable in a variety of ways, but at its core unreliability is a measure of how much the result would vary if we redid it many times.


There are two ways to measure reliability that the IB wants you to know:

Test-retest reliability

This just means redoing the measurement sometime later and comparing how much the results change. In some cases (like the cholesterol example) this is a great approach, but in others it works less well.


Imagine measuring drivers' knowledge of road laws by giving them a test. If we re-test them a week later, they would likely have learned from the test and might perform better.


Parallel forms reliability

This approach is specifically used to assess humans. It involves giving participants two similar versions of the same test, and seeing how close their performance on one is to the other.

Validity of a test
AHL AI 4.12

Statistical validity describes how accurately the thing we measured represents what we're interested in.


For example, consider a military fitness test. A test that only measures one trait, like how fast you can run a ​5km​ race, simply doesn't test all the relevant types of physical fitness. This kind of test has a low content validity, because it only measures one small aspect of the domain we care about.


The second kind of validity the IB expects you to know is called criterion validity. The criterion is the thing you actually care about, and the criterion validity measures how well a test predicts the criterion.


A real world example is a polygraph (lie detector test). It's supposed to measure whether a person is lying, but in reality only measures how nervous they are.

Nice work completing Data Collection, here's a quick recap of what we covered:

Skills covered

Mixed Practice

Exercises checked off

I'm Plex, here to help you understand this concept!
/
Descriptive Statistics
/
Data Collection
Mixed Practice
Data Collection
Descriptive Statistics

Data Collection

0 of 0 exercises completed

Designing surveys with clear, unbiased and appropriately structured questions, and considering the reliability and validity of the data collected.

Want a deeper conceptual understanding? Try our interactive lesson!

Designing Surveys
AHL AI 4.12

If you've taken a survey before, you know that they can be very poorly designed. We've compiled below some examples of terrible survey questions:

Question
Why it's terrible

How much do you love Perplex?

Biased question, people don't like to be too critical

How often do your parents argue about money at home?

Too personal

Tell me about your mental health.

Unstructured! I don't know what you're asking

How many hours do you study each week?

  • Never

  • 3

  • Quite a lot

  • Yes

Inconsistent answer choices


Reliability of a measurement
AHL AI 4.12

When we use statistics to come to a conclusion, the reliability of the conclusion describes how consistent the variables we measured are.


To determine if a measurement is reliable, we ask ourselves whether we would get similar results redoing the same experiment, with the same sample.


An example of an unreliable test is a lab analysis of cholesterol level. The same person could have their blood tested just days apart and get significantly different results. The measurement can be made more reliable in a variety of ways, but at its core unreliability is a measure of how much the result would vary if we redid it many times.


There are two ways to measure reliability that the IB wants you to know:

Test-retest reliability

This just means redoing the measurement sometime later and comparing how much the results change. In some cases (like the cholesterol example) this is a great approach, but in others it works less well.


Imagine measuring drivers' knowledge of road laws by giving them a test. If we re-test them a week later, they would likely have learned from the test and might perform better.


Parallel forms reliability

This approach is specifically used to assess humans. It involves giving participants two similar versions of the same test, and seeing how close their performance on one is to the other.

Validity of a test
AHL AI 4.12

Statistical validity describes how accurately the thing we measured represents what we're interested in.


For example, consider a military fitness test. A test that only measures one trait, like how fast you can run a ​5km​ race, simply doesn't test all the relevant types of physical fitness. This kind of test has a low content validity, because it only measures one small aspect of the domain we care about.


The second kind of validity the IB expects you to know is called criterion validity. The criterion is the thing you actually care about, and the criterion validity measures how well a test predicts the criterion.


A real world example is a polygraph (lie detector test). It's supposed to measure whether a person is lying, but in reality only measures how nervous they are.

Nice work completing Data Collection, here's a quick recap of what we covered:

Skills covered

Mixed Practice

Exercises checked off

I'm Plex, here to help you understand this concept!

Generating starter questions...

1 free

Generating starter questions...

1 free