Linear Regression and Correlation

Shodor > Interactivate > Lessons > Linear Regression and Correlation


This lesson is designed to introduce students to correlation between two variables and the line of best fit.

These activities can be done individually or in groups of as many as four students. Allow 1.5-2 hours of class time for the entire lesson if all portions are done in class.


Upon completion of this lesson, students will:

  • have plotted bivariate data onto a scatter plot
  • have seen the line of best fit for several different scatter plots
  • be able to estimate the lines of best fit for data sets
  • be able to estimate the correlation coefficient for data sets

Standards Addressed:

Student Prerequisites

  • Arithmetic: Students must be able to:
    • plot points on the Cartesian coordinate system
  • Statistics: Students must:
    • have a very basic understanding of correlation
  • Technological: Students must be able to:
    • perform basic mouse manipulations such as point, click and drag
    • use a browser for experimenting with the activities

Teacher Preparation

Students will need:

  • Access to a browser
  • Scatter Plot Exploration Questions
  • Graph paper and pencil

Key Terms

correlationA statistical measure referring to the relationship between two random variables. It is a positive correlation when each variable tends to increase or decrease as the other does, and a negative or inverse correlation if one tends to increase as the other decreases.
correlation coefficientA numerical value (between +1 and -1) that identifies the strength of the linear relationship between variables. A value of +1 indicates an exact positive relationship, -1 indicates an exact inverse relationship, and 0 indicates no predictable relationship between the variables.
line of best fitA straight line used as a best approximation of a summary of all the points in a scatter-plot. The position and slope of the line are determined by the amount of correlation between the two, paired variables involved in generating the scatter-plot. This line can be used to make predictions about the value of one of the paired variables if only the other value in the pair is known.
linear regressionAn attempt to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered as the independent variable, and the other is considered as the dependent variable.
residualThe observed value minus the predicted value. It is the difference of the results obtained by observation, and by computation from a formula.
scatter plotA graphical representation of the distribution of two random variables as a set of points whose coordinates represent their observed paired values.
slope of a linear functionThe slope of the line y = mx + b is the rate at which y is changing per unit of change in x. The units of measurement of the slope are units of y per unit of x (cf. Linear Functions Discussion).

Lesson Outline

  1. Focus and Review

    Review with the class the concept of correlation. Have the students begin to think about the words and ideas of this lesson:

    • What are two variables that have no correlation with one another? Can anyone give me an example of two variables that have some sort of correlation with one another? Is this a positive or a negative correlation?

  2. Objectives

    Let the students know what it is that they will be doing and learning today. Say something like this:

    • Today, class, we are going to learn more about correlation between two variables and be introduced to the line of best fit.
    • We are going to use the computers to learn more about correlation, but please do not turn your computers on until I ask you to. I want to show you a little about this activity first.

  3. Teacher Input

    • Lead a discussion on correlation of variables and the purpose of the line of best fit.
    • Lead a discussion on the correlation coefficient, r, and how it varies depending on the relationship of the data on the scatter plot.

  4. Guided Practice

    As a class complete the Scatter Plot Exploration Questions. Have the students draw a scatter plot of the class data on a sheet of graph paper. Ask the class where they predict the line of best fit will lie and what they think the correlation coefficient is. Together, graph this data using the Regression activity, look at the actual results, and compare these findings with your predictions.

  5. Independent Practice

    Have the students use the Regression activity to estimate the line of best fit for their own data sets and then see where the line of best fit actually lies. Encourage them to experiment with data sets that include outliers. Also, have the students experiment with creating scatter plots that will have a specific correlation coefficient.

  6. Closure

    You may wish to bring the class back together for a discussion on the findings. Once the students have been allowed to share what they have found, summarize the results of the lesson.

Alternate Outline

This lesson can be rearranged in several ways.

  • omit the discussion of the correlation coefficient
  • omit the scatter plot worksheet
  • As a class, before splitting them into groups, have the students plot specific points on the Regression activity and have each of them draw the line of best fit that they imagine. Then, have them select the true line of best fit and see who had the closest estimation.

a resource from CSERD, a pathway portal of NSDL NSDL CSERD