# Linear Regression and Correlation

Shodor > Interactivate > Lessons > Linear Regression and Correlation

### Abstract

This lesson is designed to introduce students to correlation between two variables and the line of best fit.

These activities can be done individually or in groups of as many as four students. Allow 1.5-2 hours of class time for the entire lesson if all portions are done in class.

### Objectives

Upon completion of this lesson, students will:

• have plotted bivariate data onto a scatter plot
• have seen the line of best fit for several different scatter plots
• be able to estimate the lines of best fit for data sets
• be able to estimate the correlation coefficient for data sets

### Student Prerequisites

• Arithmetic: Students must be able to:
• plot points on the Cartesian coordinate system
• Statistics: Students must:
• have a very basic understanding of correlation
• Technological: Students must be able to:
• perform basic mouse manipulations such as point, click and drag
• use a browser for experimenting with the activities

### Teacher Preparation

Students will need:

• Scatter Plot Exploration Questions
• Graph paper and pencil

### Key Terms

 correlation A statistical measure referring to the relationship between two random variables. It is a positive correlation when each variable tends to increase or decrease as the other does, and a negative or inverse correlation if one tends to increase as the other decreases. correlation coefficient A numerical value (between +1 and -1) that identifies the strength of the linear relationship between variables. A value of +1 indicates an exact positive relationship, -1 indicates an exact inverse relationship, and 0 indicates no predictable relationship between the variables. line of best fit A straight line used as a best approximation of a summary of all the points in a scatter-plot. The position and slope of the line are determined by the amount of correlation between the two, paired variables involved in generating the scatter-plot. This line can be used to make predictions about the value of one of the paired variables if only the other value in the pair is known. linear regression An attempt to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered as the independent variable, and the other is considered as the dependent variable. residual The observed value minus the predicted value. It is the difference of the results obtained by observation, and by computation from a formula. scatter plot A graphical representation of the distribution of two random variables as a set of points whose coordinates represent their observed paired values. slope of a linear function The slope of the line y = mx + b is the rate at which y is changing per unit of change in x. The units of measurement of the slope are units of y per unit of x (cf. Linear Functions Discussion).

### Lesson Outline

1. Focus and Review

Review with the class the concept of correlation. Have the students begin to think about the words and ideas of this lesson:

• What are two variables that have no correlation with one another? Can anyone give me an example of two variables that have some sort of correlation with one another? Is this a positive or a negative correlation?

2. Objectives

Let the students know what it is that they will be doing and learning today. Say something like this:

• Today, class, we are going to learn more about correlation between two variables and be introduced to the line of best fit.

3. Teacher Input

• Lead a discussion on correlation of variables and the purpose of the line of best fit.
• Lead a discussion on the correlation coefficient, r, and how it varies depending on the relationship of the data on the scatter plot.

4. Guided Practice

As a class complete the Scatter Plot Exploration Questions. Have the students draw a scatter plot of the class data on a sheet of graph paper. Ask the class where they predict the line of best fit will lie and what they think the correlation coefficient is. Together, graph this data using the Regression activity, look at the actual results, and compare these findings with your predictions.

5. Independent Practice

Have the students use the Regression activity to estimate the line of best fit for their own data sets and then see where the line of best fit actually lies. Encourage them to experiment with data sets that include outliers. Also, have the students experiment with creating scatter plots that will have a specific correlation coefficient.

6. Closure

You may wish to bring the class back together for a discussion on the findings. Once the students have been allowed to share what they have found, summarize the results of the lesson.

### Alternate Outline

This lesson can be rearranged in several ways.

• omit the discussion of the correlation coefficient
• omit the scatter plot worksheet
• As a class, before splitting them into groups, have the students plot specific points on the Regression activity and have each of them draw the line of best fit that they imagine. Then, have them select the true line of best fit and see who had the closest estimation.  