This is a fallacy affecting statistical inferences, which are arguments of the following form:
N% of sample S has characteristic C.
(Where sample S is a subset of set P, the population.)
Therefore, N% of population P has characteristic C.
For example, suppose that an opaque bag is full of marbles, and you can win a prize by guessing the proportions of colors of the marbles in the bag. Assume, further, that you are allowed to stick your hand into the bag and withdraw one fistful of marbles before making your guess. Suppose that you pull out ten marbles, six of which are black and four of which are white. The set of all marbles in the bag is the population which you are going to guess about, and the ten marbles that you removed is the sample. You want to use the information in your sample to guess as closely as possible the proportion of colors in the bag. You might draw the following conclusions:
60% of the marbles in the bag are black.
40% of the marbles in the bag are white.
Notice that if 100% of the sampled marbles were black, say, then you could infer that all the marbles in the bag are black, and that none of them are white. Thus, the type of inference usually referred to as "induction by enumeration" is a type of statistical inference, even though it doesn't use percentages. Similarly, from the example we could just draw the vague conclusion that most of the marbles are black and few of them are white.
The strength of a statistical inference is determined by the degree to which the sample is representative of the population, that is, how similar in the relevant respects the sample and population are. For example, if we know in advance that all of the marbles in the bag are the same color, then we can conclude that the sample is perfectly representative of the color of the population—though it might not represent other aspects, such as size. When a sample perfectly represents a population, statistical inferences are actually deductive enthymemes. Otherwise, they are inductive inferences.
Moreover, since the strength of statistical inferences depend upon the similarity of the sample and population, they are really a species of argument from analogy, and the strength of the inference varies directly with the strength of the analogy. Thus, a statistical inference will commit the Fallacy of Unrepresentative Sample when the similarity between the sample and population is too weak to support the conclusion. There are two main ways that a sample can fail to sufficiently represent the population:
The sample is simply too small to represent the population, in which case the argument will commit the subfallacy of Hasty Generalization.
The sample is biased in some way as a result of not having been chosen randomly from the population. The Example is a famous case of such bias in a sample. It also illustrates that even a very large sample can be biased; the important thing is representativeness, not size. Small samples can be representative, and even a sample of one is sufficient in some cases.