This lesson is designed to introduce students to correlation between two variables and the line of
best fit.

These activities can be done individually or in groups of as many as four students. Allow 1.5-2
hours of class time for the entire lesson if all portions are done in class.

Objectives

Upon completion of this lesson, students will:

have plotted bivariate data onto a scatter plot

have seen the line of best fit for several different scatter plots

be able to estimate the lines of best fit for data sets

be able to estimate the correlation coefficient for data sets

Standards Addressed:

Grade 10

Statistics and Probability

The student demonstrates an ability to classify and organize data.

The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Grade 6

Statistics and Probability

The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating; drawing or justifying conclusions).

Grade 7

Statistics and Probability

The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions; drawing or justifying conclusions).

Grade 8

Statistics and Probability

The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Grade 9

Statistics and Probability

The student demonstrates an ability to classify and organize data.

The student demonstrates an ability to analyze data (comparing, explaining, interpreting, evaluating, making predictions, describing trends; drawing, formulating, or justifying conclusions).

Statistics and Probability

Interpreting Categorical and Quantitative Data

Summarize, represent, and interpret data on two categorical and quantitative variables

Interpret linear models

Grades 9-12

Algebra

Understand patterns, relations, and functions

Data Analysis and Probability

Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them

Select and use appropriate statistical methods to analyze data

Algebra I

Data Analysis and Probability

Competency Goal 3: The learner will collect, organize, and interpret data with matrices and linear models to solve problems.

Student Prerequisites

Arithmetic: Students must be able to:

plot points on the Cartesian coordinate system

Statistics: Students must:

have a very basic understanding of correlation

Technological: Students must be able to:

perform basic mouse manipulations such as point, click and drag

use a browser for experimenting with the activities

Teacher Preparation

Students will need:

Access to a browser

Scatter Plot Exploration Questions

Graph paper and pencil

Key Terms

correlation

A statistical measure referring to the relationship between two random variables. It is a positive correlation when each variable tends to increase or decrease as the other does, and a negative or inverse correlation if one tends to increase as the other decreases.

correlation coefficient

A numerical value (between +1 and -1) that identifies the strength of the linear relationship between variables. A value of +1 indicates an exact positive relationship, -1 indicates an exact inverse relationship, and 0 indicates no predictable relationship between the variables.

line of best fit

A straight line used as a best approximation of a summary of all the points in a scatter-plot. The position and slope of the line are determined by the amount of correlation between the two, paired variables involved in generating the scatter-plot. This line can be used to make predictions about the value of one of the paired variables if only the other value in the pair is known.

linear regression

An attempt to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered as the independent variable, and the other is considered as the dependent variable.

residual

The observed value minus the predicted value. It is the difference of the results obtained by observation, and by computation from a formula.

scatter plot

A graphical representation of the distribution of two random variables as a set of points whose coordinates represent their observed paired values.

slope of a linear function

The slope of the line y = mx + b is the rate at which y is changing per unit of change in x. The units of measurement of the slope are units of y per unit of x (cf. Linear Functions Discussion).

Lesson Outline

Focus and Review

Review with the class the concept of correlation. Have the students begin to think about the words
and ideas of this lesson:

What are two variables that have no correlation with one another? Can anyone give me an
example of two variables that have some sort of correlation with one another? Is this a
positive or a negative correlation?

Objectives

Let the students know what it is that they will be doing and learning today. Say something like
this:

Today, class, we are going to learn more about correlation between two variables and be
introduced to the line of best fit.

We are going to use the computers to learn more about correlation, but please do not turn your
computers on until I ask you to. I want to show you a little about this activity first.

Teacher Input

Lead a
discussion on correlation of variables and the purpose of the line of best fit.

Lead a
discussion on the correlation coefficient, r, and how it varies depending on the relationship of the
data on the scatter plot.

Guided Practice

As a class complete the
Scatter Plot Exploration Questions. Have the students draw a scatter plot of the class data on a sheet of graph paper. Ask the class
where they predict the line of best fit will lie and what they think the correlation coefficient
is. Together, graph this data using the
Regression activity, look at the actual results, and compare these findings with your predictions.

Independent Practice

Have the students use the
Regression activity to estimate the line of best fit for their own data sets and then see where the line of
best fit actually lies. Encourage them to experiment with data sets that include outliers. Also,
have the students experiment with creating scatter plots that will have a specific correlation
coefficient.

Closure

You may wish to bring the class back together for a discussion on the findings. Once the students
have been allowed to share what they have found, summarize the results of the lesson.

Alternate Outline

This lesson can be rearranged in several ways.

omit the discussion of the correlation coefficient

omit the scatter plot worksheet

As a class, before splitting them into groups, have the students plot specific points on the
Regression activity and have each of them draw the line of best fit that they imagine. Then, have them
select the
true line of best fit and see who had the closest estimation.