From known data, we can determine if values of variables are linearly related, meaning a straight line can be used to summarize the data. Suppose Joe is writing a science report on properties of certain metals. He finds several metals whose melting points are between the melting and boiling points of water. His chemistry book lists their melting points in degrees Celsius, but Joe wishes to also give them in degrees Fahrenheit. Below is a graph he has begun. The dark blue points show the freezing and boiling points for water. The melting points for three of the metals are also graphed. He hasn't yet calculated the Fahrenheit melting points for cesium (red) and rubidium (green). Complete the graph by "dragging" the red and green dots up to their proper Fahrenheit values. When you have correctly placed the points, click the "Done" button.
The relationship between degrees Celsius and degrees Fahrenheit is given by the equation of the line:
The first numerical value in the equation is 32. This value represents the obvious fact that 0 degrees C is the same as 32 degrees F. Generally, this first numerical term in an equation representing a linear relationship between two variables indicates the value of y when x is zero, and this value is labeled the "y-intercept".
The second numerical value in the equation is 9/5, and it is the multiplier for the x variable. The value of 9/5 indicates that there is a 9/5-unit increase in degrees Fahrenheit for every one-unit increase in degrees Celsius. In an equation representing a linear relationship between two variables, the second numerical term generally is a multiplier that gives the slope of the regression line seen in the graph of the data and is labeled the " regression coefficient".
The above example is easy to understand because there is a perfect relationship between degrees Celsius and Fahrenheit. That is, knowing the temperature in degrees Celsius allows one to predict the temperature in degrees Fahrenheit with perfect accuracy. The line connecting the points in the above graph is simply the conversion equation between the two measurement scales. In the behavioral sciences, however, the variables of interest are not perfectly related. We can use the data to determine if a linear relationship between the variables exists. If so, a regression line may be calculated from the data values. Without a perfect linear relationship, the regression line will not connect all the data points. Rather, it is the line which comes closest to all the data, making it the best general representation of the data set. Consider the following example.
Chances are, you were able to come pretty close to the correct regression line, which the computer calculated by finding the line passing through the data that minimizes the total distance between all the points and the regression line. This is known as a line of best fit, which is another name for the regression line. The regression line/line of best fit is important because it represents the most likely y value for any given x value. Scientists interested in the relationship between two variables typically quantify this relationship by representing the line of best fit as a mathematical equation known as a regression equation. The general form of a bivariate (two-variable) regression equation is:
Important points about the regression coefficient: |
For example, intelligence test scores are really ordinal measures because there is no evidence that the units of measurement represent equal intervals. That is, there is no evidence that the difference in intelligence between the IQ scores of 95 and 96 is equal to the difference in intelligence for the IQ scores of 105 and 106. Nonetheless, the behavioral scientist will typically assume that the intervals are equal due to the continuous nature of the concept being measured. In short, any theoretically continous variable that is measured in some manner where the intervals appear equal are frequently used as the y-variable in regression analysis.
Pratically speaking, there are no limiting assumptions for the x-variable. As such, the x-variable can be measured on any measurement scale (i.e., nominal, ordinal, interval, or ratio). The following visualization allows you to examine real data scatterplots and the corresponding regression lines when using x-variables measured on different types of scales.
Go to Top of Page
Return to Table of Contents