Probability vs. Statistics Discussion

Student: I notice that people sometimes use the words statistics and probability when talking about the same things. Are these two words just different names for the same concept?

Mentor: What do you think?

Student: I want to check a dictionary first and see what it says.

Mentor: Check several dictionaries and based on what you find, make a definition for each word. A scientific or mathematical dictionary will give you more detailed information.

 Probability: 1: being probable 2: something that is probable 3: a ratio expressing the chances that a certain event will occur 4: a branch of mathematics studying chances of random events. Statistics: 1: facts or data assembled and classified so as to present significant information 2: collection, calculation, description, manipulation, and interpretation of the mathematical attributes of large sets or populations 3: a branch of mathematics dealing with collection, analysis and interpretation of data.

Student: So statistics is all about data, and probability is all about chance.

Mentor: Exactly. Let me talk about probability as the measure of chance. Specialists look at this meaning of probability in two different ways that are called Frequency View and Personal View (or Subjective View, as philosophers call it).

vs.

### Personal View

Example: To find the chances (probability) of getting 3 on a six-sided die, you roll the die 1,000,000 times. For 166,549 times, the roll is a 3.
You find the proportion of 3's by dividing:

166,549 / 1,000,000 = 0.166549

It is approximately 1/6, so you conclude that the probability of getting 3 on this particular die is 1/6.

vs.

Example:To find the chances (probability) of getting 3 on a six-sided die, you sit down and think. You reason that all the sides of the die are the same, and that you can believe that the die does not have holes or heavy objects inserted into it. You conclude that each side of the die should have the same chance of landing face up, and therefore, that when you roll the die, you have one chance in six to get a 3. Your answer is that the probability of getting 3 is 1/6.
Definition: Probability of an event in an experiment is the proportion (or frequency) of that event when the same exact experiment is repeated many times.

vs.

Definition: Probability of an event is what a person who studies it believes about the chances of the event. People who define probabilities use their knowledge about the world to make "the best possible guess."
Who likes it:Scientists, mathematicians.

vs.

Who likes it: Philosophers, economists, mathematicians.

Mentor: Which of these two ways of looking at probability is closer to statistics?

Student: The Frequency View, because it talks about collecting data.

Mentor: A very important part of the Frequency View definition is that you need to repeat the same exact experiment to find the probability. It is almost never possible where humans are concerned, for example, in sports or medicine. I would like to offer you several quotes, and you can find and correct the errors in them.

Student: Sounds like fun. When I learn to do it, I can find quotes in journals or on TV and correct them, too!

### Conclusion that may be true

"Our team won about 3/4 of the games  in every season so far. I tell you, the probability of us winning the next game is 3 out of 4!" Each game is different from other games. Maybe the opposing team will be much stronger than usual next time. Maybe the weather will be different. Maybe a key player will be sick. And so on. Also, the team may always win against a particular team (the one that is going to play tomorrow), which will affect the chances. "Our team won about 3/4 of the games in every season so far. If nothing major changes, I believe we are going to win about 3/4 of the games in this season, too."
"One out of eight women in the USA develops breast cancer during her lifetime. Therefore, if you are female, the probability of you having this form of cancer is1/8." You are unique (just like everybody else). There is no way for a person to know her exact chances in anything that is connected with health. Studies show that body proportions, diet, weight, clothes preferences, number of pregnancies and breastfeeding all affect breast cancer rates in women. Even though "one out of eight" is the average for the USA, it does not tell much about each particular person. "One out of eight women in the USA develops breast cancer during her lifetime. If we randomly select 1,000,000 women and look at their medical histories, we can expect about 125,000 (not exactly!) of them to develop breast cancer."
"On the average, drivers have accidents once every two years. Your last accident was 3 years ago, so you can expect an accident any time now." Rates of accidents vary greatly with experience, car type, age and health of the driver, driving habits, and so on. National average says close to nothing about your chances of having an accident. "On the average, drivers have accidents once every two years. If you randomly choose 1000 drivers, you can expect them all together to have had about 5000 accidents over the previous 10 years."

Student: All these errors are of the same type. They take data about large numbers of people, and try to use it in personal cases.

Mentor: Collecting data about large numbers of people (or other objects), and using this data for studying other large groups of people as you did in the "Conclusion that may be true" column, belongs to statistics. The only time it can be used for probability, that is, for studying chances in individual cases, is when all the experiments are the same (or almost the same). You can use data (statistics) from rolling a six-sided die one million times (in exactly the same manner!) to find the chances (probability) of rolling 5 on your next try. You can not use data (statistics) from studying driving records of a million people to find the chances (probability) of yourself having an accident today.

Student: So statistics deals with data that may or may not be useful for finding probability.

Mentor: Yes. Data can also be useful by itself, without any connection to probability. For example, you need to know, at least approximately, how many voters live in a particular city in order to prepare for elections. You may want to know the average amount of hazardous chemicals each factory discharges into a particular water basin per month in order to find out if there is a serious environmental problem. You might want to know the proportion of people who get the flu during each year in order to compare several years and to try to find out what may cause increases in flu rates.

Student: I am just glad there are computers to help us to deal with all that data!