**alternative
hypothesis** - the hypothesis that the researcher expects
to support.

**analysis
of variance** - a statistical test of the difference
of means for two or more groups (also termed **ANOVA**

**ANOVA** -
ANOVA is an acronym for **analysis of
variance**. It is a statistical test of the difference of means
for two or more groups.

**box-plot** -
Summary plot based on the median, quartiles, and extreme values.
The box represents the interquartile range which contains the 50% of values.
The whiskers represent the range; they extend from the box to the highest
and lowest values, excluding outliers. A line across the box indicates
the median.

**categorical
variable - **a variable that has mutually exclusive
("named") groups that lacks intrinsic order. Major in college and race
are examples of a categorical variable.

**central
tendency ** - a typical or representative value for a dataset.
It can be reported as either the **mean**, the **median**,
or the **mode, **depending on the data and/or one's
purposes.

**Chi Square
** - a statistical procedure which examines the relationship between
two categorical variables. The test is based on the discrepancy between the observed number of
observations in each category and the expected number of observations in each category.

**coefficient
of determination ** - a statistic used in linear regression
that indicates the amount of variation in the dependent variable which
is explained or accounted for by the independent variable(s).

**confidence interval -
**the
generic label used to describe the decision points where the researcher
favors the alternative hypothesis over the null hypothesis. Stated differently,
it is the range of mean values within which the true population mean is
likely to fall.

**continuous variable -
**a
variable which can assume an infinite number of values. Weight is an example
of a continuous variable. Between any two measures of weight (e.g., 150
to 151 pounds) lie an infinite number of possible values (e.g., 150.1,
150.2, 150.21, . . .).

**convenience sample** - this kind of sampling is used when the researcher decides to select the
units of study on the basis of their being readily available.

**correlation** - a standardized index
of the strength and direction of the relationship between two variables. The range
for the possible correlation between any two variables is from -1.00 (a perfect inverse
relationship) to +1.00 (a perfect positive relationship).

**covariance** -
a measure of association between a pair of variables. It is similar to a correlation,
but a correlation is expressed in a standardized metric, whereas covariance is expressed
in the units of the original variables.

**critical
value ** - value that establishes the boundaries of the
confidence interval.

**decile** - a subset of
adjacent scores in a distribution representing 10% of a sample or a population.
A "decile score" is a raw score corresponding to the 10th, 20th, or 30th
etc. percentage score.

**degrees of freedom** - the
number of components in the calculation of a statistic that are free to
vary

**dichotomous variable**
- is a discrete measure with two categories that may or may not be ordered.
It is a variable which has only two categories.

**discrete variable **- a
variable which is limited to a finite number of values. A discrete variable
usually describes something which occurs only in whole units. The number
of males in an English class is an example of a discrete variable.

**dispersion **- the
"spread" of a data set, the departure from central tendency.

**distribution** - In
a distribution, the horizontal axis (x-axis) represents the variable being
described. The density of the smooth curve over the x-axis represents the
probability of occurrence for each of the values on the x-axis.

**explained
variance** - the variance in Y about Y' where Y' is the
value of Y on the regression line predicted by the regression equation.
If the regression line does not help in predicting Y, then it will pass
through Y-bar, in which case, B_{yx}= 0. In absolute value
terms, the highest possible score for B_{yx}= +/- 1.00.

**heteroscedasticity** -
a condition in which the variances of two or more population distributions
are not equal.

**histogram** - a bargraph
used to represent the frequency of each value occurring in a distribution
of scores.

**homoscedasticity** -
a condition in which the variances of two or more population distributions
are equal.

**hypotheses** - a
set of two or more mutually exclusive and often exhaustive statements.
The goal of hypothesis testing is to determine which is true.

**independent samples
t-test** - In hypothesis testing, this is the procedure used to
compare the means of two different samples. As is true for all t-tests,
the standard error is not known and is estimated from sample data.

**interval data** -
data that possess magnitude (one value can be judged greater than, less
than, or equal to another) and a constant distance between intervals (units
of measurement are the same on the scale regardless of where the unit falls).
Temperature is an example of interval data: the difference between 100
degrees and 99 degrees is the same as the difference between 40 degrees
and 39 degrees. Interval data do not necessarily have an absolute zero
point (i.e., a temperature of zero degrees does not indicate that there
is no temperature).

**interval variable**
- is a variable whose attributes are rank ordered and have equal distances
between adjacent attributes. An example of an interval variable would
be the Fahrenheit temperature scale.

**kurtosis -** the degree
of flatness or peakedness of a graph of a frequency distribution. The relatively
flat distributions are described as platykurtic.
Distributions with medium curvature are mesokurtic (note: a normal
distribution is mesokurtic). The most peaked distributions are leptokurtic.

**leptokurtic** - a
distribution that is more peaked than a normal distribution. This
is to say there are more cases concentrated close to the mean than in a
normal distribution.

**line of best fit (least
squares fit) **– the least squares fit procedure allows us to
reduce the scatterplot to a single straight line described by a linear equation.
It minimizes the square of the vertical distance between each point and
the regression line.

**marginal** - the frequency
distribution of each of two crosstabulated variables. There are row marginals
and column marginals.

**mean **- a measure of **central
tendency** calculated by dividing the sum of the scores in a distribution
by the number of scores in the distribution. This value best reflects the
typical score of a data set when there are few outliers and/or the dataset
is generally symmetrical.

**median **- the value
in a data set which divides the scores into two equal halves (i.e., an equal
number of scores lie above and below it). As a measure of **central
tendency**, it is largely unaffected by extreme values.

**mode **- the score that
occurs most frequently in a data set. This measure of **central
tendency** is the only one appropriate for nominal data.

**negative skew **- asymmetry
in a distribution in which the scores are bunched to the right side of
the center. With a negatively skewed distribution, the mean generally falls
to the left of the median and the median usually lies to the left of the
mode. Study Hint: the tail of a negatively skewed distribution
points to the negative side of the number line.

**nonprobability sample**
- a type of sampling that involves the researcher's judgment to determine
the elements to be selected for the sample.

**nominal data **- data
that are classified into mutually exclusive ("named") groups that lack
intrinsic order. Major in college and race are examples of nominal data.

**normal distribution** -
a theoretical distribution which is typically bell-shaped when graphed.
The distribution is theoretical because the height of the curve is defined
by a mathematical formula (and the exact values necessary to create the
curve would never occur).

**null hypothesis** -
the prediction that the researcher believes will be "nullified." That is,
the researcher believes this prediction is not true.

**observation** -
the empirical data that it used to support or refute a hypothesis

**ordinal data **- data
whose values are ordered so that we can make inferences regarding magnitude,
but which have no fixed interval between values. An example of ordinal
data is a letter grade on a test.

**ordinal variable**
- is a variable whose values are ordered so that we can make inferences
regarding magnitude, but which have no fixed interval between values.
Letter grade on a test would be an ordinal variable: while an 'A' is
greater than a 'B' which is greater than a 'C', we cannot conclude that
the distance between an 'A' and a 'B' is the same as the distance
between a 'B' and a 'C'.

**outlier **- a value
in a data set that is very different from most other values in the set.

**paired t-test** - In
hypothesis testing, this is the procedure used when the independent variable
is within subjects in nature. The goal is to compare two levels of the
independent variable assigned to the same group of subjects at different
points in time. As is true for all t-tests, the standard error is not known
and is estimated from sample data.

**parameter** - a characteristic
of a population, e.g. mean (), pronounced
"myu", and standard deviation (), or
"sigma".

**pearson's
correlation coefficient ** - a measure of association between
two continuous variables which estimates both the direction and strength
of a linear relationship.

**percentile** - A
value that exceeds a specific percentage of the distribution. Thus, if
the 63rd percentile score for a set of students on the SAT verbal exam
is 560, then 63% of scores are at or below 560.

**platykurtic** - a
distribution that is flatter than a normal distribution. This is
to say that there are more cases in the tails of the distribution than
in a normal distribution.

**population** -
the set of all possible data values that *could *be observed.

**positive skew** -
asymmetry in a distribution in which the scores are bunched to the left
side of the center. With a positively-skewed distribution, the mean generally
falls to the right of the median and the median usually lies to the right
of the mode. Study Hint: the tail of a positively skewed distribution points
to the positive side of a number line.

**probability sample -**
sampling in which each element within a study population has a known, nonzero
chance of being selected into the sample.

**protocol** -
a specified methodology for performing a task

**quartile** - a subset
of adjacent scores in a distribution representing 25% of a sample or a population.
A "quartile score" is a raw score corresponding to the 25th, 50th, or 75th
percentile score.

**quintile** - A subset
of adjacent scores in a distribution representing 20% of a sample or a
population. A "quintile score" is a raw score corresponding to the 20th,
40th, 60th, or 80th percentile score.

**random sample** - a
sample that contains observations which are selected form a population
so that every member of the population has a known chance of selection for
a sample.

**random variable** -
the measurements of a random variable vary in a seemingly random and unpredictable
manner. A random variable assumes a unique numerical value for each
of the outcomes in the sample space of the probability experiment.

**range **- a simple measure
of
**dispersion,** indicating the difference
between the lowest and highest values observed.

**ranked
categories** - categories within a variable that are logically
ranked. The different attributes of each category represent relatively
more or less of the variable.

**ratio data **- data
that are ordered (so that we can make inferences regarding magnitude), have
equal intervals between values, and contain an absolute zero point. Height
is an example of ratio data: 60 inches is taller than 55 inches, the distance
between 60 and 55 inches is the same as the distance between 30 and 25
inches, and a height of 0 inches implies no height at all.

**ratio variable**
- these are variables that are based on a true zero point. An example
of a ratio variable would be age.

**regression** - a
statistical procedure that allows us to determine the extent to which we can
predict a given observation's score on a dependent variable, given that observation's
score on one or more independent variables.

**regression coefficient** -
the slope of the regression line. It represents the change in y for
every one unit change in x.

**regression line** - a
model that simplifies the relationship between two variables. By
approximating a line through the center of a scatterplot that represents
the data, we create a two dimensional “center” for the data. The
line summarizes the data points in the same way that measures of central
tendency do.

**sample **- a
collection of observations selected form a larger population.

** sampling
distribution
**- all possible non-overlapping samples that
can be drawn, given a constant sample size.

**sampling
distribution of means** - a frequency distribution of a
large number of random sample means that have been drawn from the same
population.

**sampling distribution
of the difference between means** - a sampling distribution that
consists of the differences in means between groups.

**sampling
distribution of means** - a frequency distribution of a
large number of random sample means that have been drawn from the same
population.

**sampling
distribution of the mean of difference scores** - a sampling
distribution that consists of the differences in means within subjects
across treatments.

**sampling
error
**- the extent to which a sample distribution is
different than the population distribution from which the sample is drawn.

**scatterplot** -
a group of data points that are plotted along x-axis and y-axis coordinates.
Every individual is represented as a data point, whereby a perpendicular
line from the individual's "X" value intersects a perpendicular line from
the individual's "Y" value.

**single sample t-test**
- In hypothesis testing, this is the procedure used to compare the mean
of one sample to a known population mean. As is true for all t-tests, the
standard error is not known and is estimated from sample data.

**skewness **- asymmetry
in a distribution in which scores are bunched on one side of the distribution.
See
**positive skew**, **negative
skew**.

**standard deviation** -
a measure of **dispersion** describing the
spread of scores around the **mean**. It is the
square root of the **variance**.

**standard error** - the
standard deviation of a sampling distribution.

**standard error of
the mean **- the standard deviation of a sampling
distribution of means.

**standard error of the mean of difference scores** -
the standard deviation of a sampling
distribution of the mean of difference scores.

**standard score** -
a raw score that has been converted from one scale into another scale with an arbitrarily
set mean and standard
deviation. Standard scores are more easily interpreted than raw scores, because
they take into account the mean and standard deviation of the distribution of values.

**statistic** - a characteristic
of a sample, e.g. mean ()
and standard deviation(s).

**strata** - a subdivision
of a population.

**stratification** -
allocating samples among subcategories, called strata, within a population.
Stratification is sometimes necessary to improve the effectiveness of a
sampling effort or to increase understanding of population characteristics.
For example, stratifying an election survey by sex allows analysts to better
understand voter behavior by revealing differences in the way that males
and females vote.

**type I error** -
erroneously rejecting the null hypothesis: concluding that a sample came
from a different population when it in fact is from the same population.

**type II error** -
erroneously failing to reject the null hypothesis: concluding that a sample
came from the given population when it in fact is from a different population.

**variance **- a measure
of
**dispersion,** indicating the mean of
the squared deviations of a set of scores from the **mean** of
the scores.

**y-intercept** - the
point through which the line intersects the Y-axis. It is the value
of y when x equals zero.

**z score **- a standardized
score which indicates the how many **standard
deviations** a value lies above or below the mean.

**Updated August 2, 1999**