Social scientists try to make statements about a theoretical set of units. Sometimes we are able to study all units we are interested in. This means we study a census. In most cases, however, only a subset of cases is studied. Case selection is a crucial part of empirical research, and largely determines the extent to which you can generalise from your findings to the larger target population. Case selection is also called sampling, although this word is most frequently used when larger numbers of cases are selected. The main types of sampling are probability sampling and non-probability sampling. The first is mostly associated with large n studies. The second is associated with both large and smaller n studies. If the researcher does not have the intention to make an inference to a larger target population, other selection procedures are available.

Target population and sampling frame

The target population is the set of units we make statements about; companies, persons, newspaper articles, products. In many cases the target population is not easily accessed. A sampling frame is a set of units we can draw samples from. For example, if the target population is local businesses, a suitable sampling frame might be the listings in the business section of the telephone book. Preferably the sampling frame includes nearly all of the population, although this is not always possible. The sample frame is then used to select cases from.

Probability sampling

There are various types of probability sampling. All procedures use some reference to the known probability an element from the sampling frame is actually selected for study. Different procedures can be used to get a representative sample of units. A distinction is made between single stage sampling (for example, simple probability sampling, systematic sampling, stratified sampling and cluster sampling) and multi-stage sampling (in which different procedures are used sequentially (for example, first selecting municipalities and the random samples within each municipality).

Non-probability sampling

The set of non-probability sampling procedures is huge. Examples are self selection, snowball sampling and quota sampling. With all these procedures the danger is that the selection might be biased: i.e. over- or under- representing units with certain attributes. These procedures should therefore be avoided if possible.

If only a small number of cases is selected using probability sampling, the risk too is that certain attributes are over- or under-represented. To avoid this, cases must be selected on the basis of prior knowledge of their attributes (intentional selection). If the researcher tries to test a causal hypothesis het must at least select cases to ensure variation on the main independent and dependent variables. You cannot infer anything about the causes of the success of companies, for example, if you limit your selection to successful companies. Note, however, that if the number of cases is small, your conclusions can easily be the consequence of mere chance.

Selection of one case or only a few cases

If the researcher wants to explore a topic, or further develop an existing and well-tested theory several other case selection options are available. If the aim is exploration cases in single case studies are selected on the basis of the information they are expected to provide. Single cases can be selected because the case:

  • is extreme (i.e. has an extremely low or high value on the central variable)
  • is critical (i.e. ‘If it is valid for this case, it is valid for all (or many) cases’ or the converse, ‘If it is not valid for this case, then it is not valid for any (or only few) cases.’)
  • is typical (i.e. an example of the phenomenon under investigation)
  • is deviant (i.e. has a combination of characteristics different from most other units)

Note that all case selection procedures mentioned here assume the existence of a (preferably tested) theory and some general knowledge of the central variables in the units of analysis on the basis of which single cases can be selected.


Basic readings

  • Babbie, Earl (2004). The Practice of Social Research (12th edition). Belmont: Wadsworth/Thomson. Chapter 7.
  • Shadish, William R., Thomas D. Cook and Donald T. Cambell (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin. Chapter 3.
  • De Vaus, David (2001). Research Design in Social Research. London: Sage. Chapters 5, 8, 11, 14.

Additional readings

  • King, Gary, Robert O. Keohane and Sidney Verba (1994). Designing Social Inquiry: scientific inference in qualitative research. Princeton: Princeton University Press.
  • Yin, Robert K (1994). Case Study Research: Design and Methods (2nd edition). Thousand Oaks, California: Sage.
  • Swanborn, P. G (1996). Case-study’s: Wat, wanneer en hoe? Amsterdam: Boom.
  • Franzosi, Roberto P. (2004). Content Analysis, in: Melissa A. Hardy and Alan Bryman (eds). Handbook of Data Analysis. London, Sage
  • Berg, Bruce (2007) Qualitative Research Methods for the Social Sciences. Boston, Pearson.
  • Gerring, John (2001) Social Science Methodology: a criterial framework. Cambridge, Cambridge University Press. Chapter 8.