Below are three graphic representations of what the true situation might
be when comparing Virginia 6th graders to the U.S. population of 6th graders.
The larger, red curve represents the population distribution of intelligence
scores for 6th graders in all states other than Virginia. The smaller,
blue curve represents the population of intelligence scores for Virginia
6th graders. The greek letter m represents the
mean of a population.
If the top picture is true, then there is no difference between the population means, and Virginia 6th graders are not a different population from 6th graders from outside Virginia. If either the middle or bottom pictures are true, however, then for intelligence Virginia 6th graders are a different population from U.S. 6th graders outside of Virginia. In these cases, Virginia 6th graders would be a different population because their average intelligence is either less than (the middle picture) or greater than (top picture) the U.S. population of 6th graders. These three pictures exhaust the possibility of answers to the question in that the Virginia population's intelligence is either equal to, less than, or greater than the U.S. population.
Given this information, one way to answer the question is to examine the distribution of scores for the sample of Virginia students in relation to the 100 point population mean intelligence score of U.S. 6th graders. In making this comparison, the researcher examines how the population mean is different from the sample mean of Virginia 6th graders. However, because the intelligence of Virginia 6th graders is represented by a sample, the researcher can never be 100% certain that he/she can make the correct decisions about the hypotheses. That is, it is possible that the sample of scores for Virginia 6th graders does not accurately reflect the distribution of intelligence scores for the population of Virginia 6th graders. Therefore, decisions based on the sample may lead to an erroneous conclusion about the population. The extent to which a sample distribution is different than the population distribution from which the sample is drawn is known as sampling error.
The dilemma faced by the researcher is to try and answer a question when using sample data that contains some unknown amount of sampling error. Fortunately, our researcher knows that the further the Virginia sample mean score is from the U.S. population mean score, then the more likely that the intelligence of the Virginia population of 6th graders is different from the intelligence of the U.S. population of 6th graders. That is, small differences between the Virginia sample mean and a U.S. population mean are likely due to sampling error. When faced with small differences, the researcher should conclude that there is not enough evidence to say that the two populations are different. If the Virginia sample mean is far away from the U.S. population mean, then it is unlikely that the difference is due to sampling error. In this case, the researcher should conclude that the Virginia population of 6th graders is different than the U.S. population of 6th graders on the dimension of intelligence. In doing so, the researcher is saying that he/she is confident that his/her findings were not due to sampling error.
Let's see if you can think like a researcher!
Now that you have the idea, let's look at some real data sets. We will assume that each distribution shown truly is the population distribution. You can compare the population mean to the mean of a random sample. As a scientist, you must determine whether or not the sample was drawn from that population or some other population.
If you selected a few variables in the above exercise, chances are you made some erroneous decisions. One mistake that you likely made was to conclude that the sample came from a different population when in fact the sample came from the shown population. This mistake is called a Type I error. A second possible mistake is that you concluded that the sample came from the shown population when in fact the sample came from a different population. This mistake is called a Type II error. In research, the goal is to avoid making a Type I or Type II error.
To this point we have used only our "eyeball" judgments of distributions to guide decisions about whether the populations are different. In practice, researchers rarely rely on such eyeball judgments because they are too imprecise, leading to the Type I and Type II errors that researchers want to avoid. Researchers are more likely to use quantitative indicators to guide decisions about hypotheses because they are more precise, and thereby reduce the the number of Type I and Type II errors. These quantitative indicators are collectively referred to as "statistical significance tests". All statistical significance tests are based on probability statements about the likelihood of the observed findings. But before you can determine the significance, you must first understand the concept of a sampling distribution.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
With such a small population, it is easy to create a sampling distribution. Now, let's create a sampling distribution of mean intelligence. The distribution will contain sample means for all possible distinct samples.
When you have all fifteen pairs of names plotted, you will have the sampling distribution of the mean for samples of size 2. All possible mean scores for distinct samples of 2 in the population will have been plotted. The overall mean of a sampling distribution of the mean is equal to the mean of the population.
The standard deviation of a sampling distribution is important because it indicates how well the mean represents the population. The larger the standard deviation, the less representative the mean will be. We can also describe this property in terms of sampling error. The larger the standard deviation of the sampling distribution, the greater the effects of sampling error. Sampling error is critical when making conclusions based on a single sample of subjects. Standard deviations of sampling distributions are so important that they have the special label of standard errors. Because the above sampling distribution is based on sample means, the standard deviation of this distribution is known as the standard error of the mean.
A sample in a sampling distribution with a smaller standard error likely will have less sampling error than a sample in a sampling distribution with a large standard error. Smaller sampling error is important for two reasons. First, the smaller the sampling error the more likely that a sample statistic is a good estimate of the corresponding population statistic. This quality is seen in the above exercise when the sample means clustered closer to the population mean as the standard error decreased. Second, and more importantly for hypothesis testing, the smaller the sampling error, the easier it is to conclude that the sample statistic represents a different population. This is because when sampling error is small, a sample mean that is far away from the population mean almost certainly comes from a different population.
Sample size is a primary determinant of sampling error, and hence the magnitude of the standard error. The exercise below shows the effect that increasing sample size has on the sampling distribution.
In the above exercise, you should have noticed that the standard error of the mean was zero when the sample size was six. That's because the entire population of six students is used to compute the mean, therefore, there is no sampling error! Recalling that small sampling error is desirable, this shows why researchers usually strive to collect as large a sample as is economically feasible. Again, one reason that small standard errors are desirable is that sample statistics more accurately represent population statistics when the standard error is small. As seen in the above exercise, the distributions of mean scores clustered closer to the population mean of 100 as sample sized increased, which shows that the sample means more accurately estimated the population mean. Also, it is easier to detect when samples do not come from the population at hand when sampling error is small. For example, the score of 115 is contained in the sampling distribution when the sample size is two, but is well outside the sampling distribution when the sample size is four. Therefore, a mean score of 115 for two new students could not be readily detected as representing a different population than our original population of 6th graders. However, a mean score of 115 for four new students would indeed indicate that a different population of 6th graders was sampled.
To summarize this section, you should now understand that the amount of sampling error is indicated by the standard error and that small standard errors are more desirable than large standard errors. Finally, the simplest way to ensure a small amount of sampling error is to take as large a sample as is economically feasible.
The term confidence interval is the generic label used to describe the decision points where the researcher favors one conclusion over another. Traditionally, researchers are very cautious about concluding that a sample is different from the comparison population. In our example, our researcher would be very cautious about concluding that Virginia 6th graders are different from the U.S. population of 6th graders. Another way to describe this cautiousness is to state that researchers are reluctant to make a Type I error. Typically, a 95% confidence interval is set. A 95% confidence interval means that if Virginia 6th graders are the same as U.S. 6th graders, then there is only a five percent chance that the Virginia sample mean would fall above or below the boundaries of the confidence interval. If the Virginia sample mean is above or below the 95% confidence interval boundaries, the researcher will conclude that Virginia 6th graders represent a different population in terms of intelligence. If the Virginia sample mean falls within the 95% confidence interval boundaries, the researcher will conclude that there is not enough evidence that Virginia 6th graders are a different population than U.S. 6th graders in terms of intelligence.
The values that establish the boundaries of the confidence interval are given the special name of critical values. For our Virginia sixth grader example, 109.8 was the upper boundary critical value and 90.2 was the lower boundary critical value. However, it is not efficient to express critical values in terms of the measuring scale used for the variable of interest because the critical value would change every time a variable with a different measuring scale is studied. To avoid having to list a different critical value every time a different variable of interest is used, researchers express critical values as z-scores or standard scores. In that way, the same critical values are used for the 95% confidence interval regardless of the measuring scale for the variable of interest. In our example, the sampling distribution of the mean is normally distributed and z = +1.96 is the upper boundary critical value and z = -1.96 is the lower boundary. The +1.96 is the critical value because 2.5% of the means in the sampling distribution will be higher than +1.96 standard deviations above the mean and 2.5% of the means will be lower than -1.96 standard deviations below the mean.
Critical values are not etched in stone; they will change as a function of both the risk a researcher is willing to take on making a Type I error and the shape of sampling distribution. As to the risk issue, the more risk a researcher is willing to take on making a Type I error the smaller the critical values. For a 90% confidence interval (a 10% chance of making a Type I error) the critical values are +1.64. The less risk the higher the critical values. For a 99% confidence interval (a 1% chance of making a Type I error) the critical values are z = + 2.57. While the sampling distribution used as an example earlier in this tutorial is normally distributed, most sampling distributions are not normally distributed. The critical values for the 95% confidence interval are different for these non-normal sampling distributions than the +1.96 seen for normally distributed sampling distributions. See the t-distribution tutorial for an explanation of why the shape of a sampling distribution is not always normal.
There are many issues surrounding how sampling distributions are used in hypothesis testing that are not covered here. First, although the above examples were based on a sampling distribution of the mean, it is important that you realize that a sampling distribution can be created for any statistic. For example, a sampling distribution can be created for standard deviations or for correlation coefficients. Although sampling distributions can be created for a multitude of statistics, the logic of sampling distributions as applied to research is the same. Second, the above example was predicated on the assumption that the population parameters for the U.S. 6th graders (i.e., mean and standard deviation) were known to the researcher. The fact is that a researcher rarely knows the population parameters. Other tutorials will deal with how researchers get around this problem of unknown parameters.
In summary, this tutorial has introduced the role of sampling distributions in hypothesis testing. Upon completion of this tutorial, you should have a general understanding of:
Return to Table of Contents