Perplex
Content
  • Exponents & Logarithms
  • Approximations & Error
  • Sequences & Series
  • Matrices
  • Complex Numbers
  • Financial Mathematics
  • Cartesian plane & lines
  • Function Theory
  • Modelling
  • Transformations & asymptotes
  • 2D & 3D Geometry
  • Voronoi Diagrams
  • Trig equations & identities
  • Vectors
  • Graph Theory
  • Probability
  • Descriptive Statistics
  • Bivariate Statistics
  • Distributions & Random Variables
  • Inference & Hypotheses
  • Differentiation
  • Integration
  • Differential Equations
Other
  • Review Videos
  • Blog
  • Landing Page
  • Sign Up
  • Login
  • Perplex
    IB Math AIHL
    /
    Descriptive Statistics
    /

    Population & Data

    Edit
    Population & Data

    Population & Data

    Understand the basics of population, data collection, and sampling.

    Want a deeper conceptual understanding? Try our interactive lesson!

    Exercises

    No exercises available for this concept.

    Key Skills

    Population
    SL 4.1

    A population is the entire group of individuals or items you want to study. It can be large (e.g., all IB students worldwide) or small (e.g., a class of 20 students), depending on the research question. For large populations, we often study a portion of the population, or sample, to make inferences about the population.

    Types of Variables
    SL 4.1

    There are two types of data to be familiar with:

    1. Categorical Variables

      • Non-numerical categories or labels (e.g., eye color, species, etc).

    2. Quantitative Variables

      • Numerical values that can be measured or counted.

      • Discrete: can only take certain fixed values (e.g., number of students, shoe size).

      • Continuous: can take any value in a range (e.g., height, temperature).

    Sampling Error
    SL 4.1

    Sampling error occurs when there is a difference between a population parameter (e.g., the average IB grade) and the sample statistic (e.g., the average IB grade at one school) used to estimate it.


    This error is random and arises simply because a sample is not the entire population, and will occur even with well-designed sampling methods.

    Measurement Error
    SL 4.1

    Measurement error is the inaccuracy in the data collection process. This could result from faulty instruments, poorly worded questions, or misunderstanding by participants.

    Coverage Error
    SL 4.1

    Coverage error occurs when some members of the population are not included in the sampling frame or are underrepresented, leading to a biased sample.


    For example, a sample of average IB grades at Swiss schools is unlikely to represent all IB schools worldwide.

    Non-response error
    SL 4.1

    Non-response error happens when selected respondents do not participate or cannot be contacted, possibly creating bias if non-respondents differ systematically from respondents.

    Random sampling
    SL 4.1

    Random sampling means every member of the population has an equal probability of being chosen. This method reduces selection bias.

    Powered by Desmos

    Convenience sampling
    SL 4.1

    Convenience sampling uses subjects who are easiest to reach. It is quick and low-cost but can be highly biased if the sample is not representative.


    For example, sampling the heights of trees on the outskirts of a dense jungle.

    Powered by Desmos

    Systematic sampling
    SL 4.1

    Systematic sampling involves selecting members at regular intervals from a list or sequence. For example, sampling every 5th student from an alphabetically sorted list of names.

    Powered by Desmos

    Stratified random sampling
    SL 4.1

    Stratified sampling splits the population into subgroups (strata) based on characteristics (e.g., age, gender). A random sample is then taken from each stratum, often in proportion to its size in the population.

    Powered by Desmos

    Quota sampling
    SL 4.1

    Quota sampling is very similar to stratified sampling, except the sample taken from each subgroup is not random.

    Powered by Desmos

    Identifying and Removing Outliers
    SL 4.1

    Outliers in data are responses that are much higher or lower than the rest of the data. Because they are such unusual pieces of data, we often check whether outlier data points are the result of an error.


    If they are the product of some error we may remove outliers, but we should not remove all of them because many are real data points.