In defense of probability sampling

By a small sample we may judge the whole piece - Miguel de Cervantes (1547-1616)

So you plan to run a survey as part of your quantitative research? How are you going to select your sample? If you haven't thought much about this, it's important to realise that sampling strategies matter quite a lot. If your goals are to better understand the characteristics of the population you are sampling, then there are certain sampling strategies you may want to avoid.

Sampling populations has a long history, but the theory that enables one to estimate population characteristics and standard errors from a sample goes back to the early and mid-20th century. Random selection of participants is an essential element of survey sampling. The main idea is to draw a sample in which the probability of selection is known. This technique is called probability sampling. This is a powerful idea that has a natural application in political polling (where one ultimately can check the surveyor's best guess), but also across all social research.

Good research demands good research tools. But in spite of the scientific rigour behind probabilistic sampling, many academic researchers opt to collect data from a non-probability sample, and treat it as if it were representative of the population. You may be familiar with non-probability sampling: convenience sampling, snowball sampling, quota sampling, or passive sample where respondents self-select are some examples. Sampling friends, co-workers, or people you meet on a street corner are examples of convenience sampling. In snowball sampling, the first respondent refers a friend, who refers another friend etc. In quota sampling, a quota is established (e.g. 20% smokers) and researchers are free to choose any respondent they wish as long as the quota is met. All such methods are like to produce biased samples because researchers may approach some kinds of respondents and avoid others. Collection methods that allow the respondent to self-select are notoriously biased, and are very unlikely to produce a representative sample. More importantly, non-probability sampling techniques cannot be used to infer from the sample to the general population. There is simply no mathematical basis for inference. The non-probability sample can only be said to represent itself, and nothing more.

The advantage of non-probability sampling over probability sampling is the ease of data collection. At the end of each process, one has a set of samples that can be "analysed" using a statistical software package. In both cases, the analyst may infer population characteristics, propose new theories, or propose policy. However, there are no justifiable grounds for drawing generalisations from studies based on non-probability samples. Treating a convenience sample as if it were representative of the study population is simply bad research.

Government statistics organisations, polling organisations, and reputable market research organisations use sound statistical practices for sampling populations. Their reputations depend on it. In this current Information age, serious researchers owe it to their discipline to produce reliable data using sound statistical methods.  Fortunately, access to information about good statistical methodology is free and available to ANU researchers. Statisticians at the Statistical Consulting Unit are still flying the flag for probability sampling.

Learn about good sampling techniques. Visit your local statistician at the Statistical Consulting Unit at the ANU!


I've listed a few useful references for further study.

Dorofeev S., Grant P.  Statistics for Real-Life Sample Surveys: Non-Simple-Random Samples and Weighted Data. Cambridge University Press (2006)

Lucas, SR. An Inconvenient Dataset: Bias and Inappropriate Inference in the Multilevel Model. Quality & Quantity, 48: 1619-1649. (2014)

Thompson S., Sampling. Wiley Series in Probability and Statistics  (2012)