Content
Understand the basics of population, data collection, and sampling.
No exercises available for this concept.
A population is the entire group of individuals or items you want to study. It can be large (e.g., all IB students worldwide) or small (e.g., a class of 20 students), depending on the research question. For large populations, we often study a portion of the population, or sample, to make inferences about the population.
There are two types of data to be familiar with:
Categorical Variables
Non-numerical categories or labels (e.g., eye color, species, etc).
Quantitative Variables
Numerical values that can be measured or counted.
Discrete: can only take certain fixed values (e.g., number of students, shoe size).
Continuous: can take any value in a range (e.g., height, temperature).
Sampling error occurs when there is a difference between a population parameter (e.g., the average IB grade) and the sample statistic (e.g., the average IB grade at one school) used to estimate it.
This error is random and arises simply because a sample is not the entire population, and will occur even with well-designed sampling methods.
Measurement error is the inaccuracy in the data collection process. This could result from faulty instruments, poorly worded questions, or misunderstanding by participants.
Coverage error occurs when some members of the population are not included in the sampling frame or are underrepresented, leading to a biased sample.
For example, a sample of average IB grades at Swiss schools is unlikely to represent all IB schools worldwide.
Non-response error happens when selected respondents do not participate or cannot be contacted, possibly creating bias if non-respondents differ systematically from respondents.
Random sampling means every member of the population has an equal probability of being chosen. This method reduces selection bias.
Powered by Desmos
Convenience sampling uses subjects who are easiest to reach. It is quick and low-cost but can be highly biased if the sample is not representative.
For example, sampling the heights of trees on the outskirts of a dense jungle.
Powered by Desmos
Systematic sampling involves selecting members at regular intervals from a list or sequence. For example, sampling every 5th student from an alphabetically sorted list of names.
Powered by Desmos
Stratified sampling splits the population into subgroups (strata) based on characteristics (e.g., age, gender). A random sample is then taken from each stratum, often in proportion to its size in the population.
Powered by Desmos
Quota sampling is very similar to stratified sampling, except the sample taken from each subgroup is not random.
Powered by Desmos
Outliers in data are responses that are much higher or lower than the rest of the data. Because they are such unusual pieces of data, we often check whether outlier data points are the result of an error.
If they are the product of some error we may remove outliers, but we should not remove all of them because many are real data points.