NCSI Talks

   

Monte Carlo Integrals Learning Scenario (Web)


Shodor > NCSI Talks > Excel > Monte Carlo Integrals Learning Scenario (Web)

Learning Scenario - Monte Carlo Integrals (Excel)

Basic Model:

Description

This systems model estimates the value of an integral by finding the average value of a function over a specified domain. Upon user direction, the excel sheet will generate random numbers within the domain of the first quadrant of a unit circle. These numbers will then be used to find the average value of the function (y-value) and, after multiplying by four, provide an estimate for pi. From this model, students should learn how to calculate the value an integral without algorithmic methods, but rather with experimental and computational methods.

Background Information

The area of a unit circle is known to be pi. Therefore, four times the area of the first quadrant of the unit circle should be pi as well. This model works by simulating Monte Carlo integrals. Random numbers are chosen in the domain of the function and the average value of the function is calculated. The first sheet of this model works by finding the average value of one quadrant of the unit circle and multiplying it by four. The value obtained should be an approximation of pi. The second sheet works by using different form of Monte Carlo integration. A series of random points are found in a square that encompasses a unit circle. The ratio of the number of points within the unit circle to those outside the unit circle will give an estimate of the area contained by one quadrant of the unit circle. Due to the Law of Large numbers, factoring in more random numbers to the calculation provides a more accurate estimate of pi. In order to demonstrate this, the model displays the values of pi calculated from increasing amounts of random numbers.

Science/Math

The fundamental principle behind this model is HAVE = HAD + CHANGE. For each run of the simulation on the first sheet, the following things are calculated and displayed:

  1. Random numbers on the interval [0,1) are calculated and recorded in the table under the X column
  2. The function value to the corresponding to each x-value on the unit circle is calculated and recorded in the second column
  3. 3. The function values are averaged to find the average value of the function and multiplied by four to obtain an estimate of pi.

The second sheet similarly demonstrates the equation HAVE = HAD + CHANGE, but it does so through the second Monte Carlo integration method with an analogy of darts hitting a dartboard. The following things happen for every run of the second simulation.

  1. Random numbers on the interval (-1,1] are calculated and recorded on the table under the x coord column for each x coordinate of the dart thrown
  2. Random numbers on the interval (-1,1] are calculated and recorded on the table under the y coord column for each y coordinate of the dart thrown
  3. Each dart that has a point (x, y) falling within the unit circle is assigned a number of 1, and every dart that has a point (x, y) falling outside of the unit circle is assigned a number of 0
  4. The total number of darts that fall inside of the unit circle are calculated and plotted under the "hits" column
  5. The ratio of the number of hits to the number of throws is calculated and multiplied by four in order to obtain an estimate of pi

Teaching Strategies

The best way to introduce this model is by first reviewing integration. After review, the following topics and ideas should be discussed:

  1. What is the equation of a unit circle? How would you find the area of the unit circle by integration?
  2. Are you able to find the area of a unit circle by taking one, simple integral? Why or why not?

The following activity might help introduce the idea of Monte Carlo integration:

  1. Setup a dartboard along a wall. Measure out a square box with sides the same length as the radius of the dartboard to fully encompass the board
  2. Have 30 darts for the students to throw at the dartboard
  3. Have a student throw all thirty darts at the dartboard
  4. Mark down the number of darts that land on the dart board and those that land within the square
  5. Attempt to calculate the area of the area of the dartboard. Find the ratio of the number of darts that landed on the dartboard to the number that landed within square and board (C/(C+S)). Multiply this number by four to estimate the area of the dartboard.

This activity is basically the method that the simulation uses to calculate the area of a unit circle. While the area of the dartboard cannot be found by simply taking the integral of the circle's equation, this method allows for a rough approximation. When students attempt to work with the model, they will be more familiar with the idea of Monte Carlo integration and how it works.

Implementation:

How to use the Model

While there are no parameters that can be changed, this model allows the user to refresh the random variables and run the calculations again. In order to do both of these, simply press [F9] (on PC) or [Command][=] (on Mac). Immediately, new random numbers will be calculated, run through the equations, and recorded on the table.
A simulation is also available on Sheet 2 with a different form of Monte Carlo integration and new random numbers. To get to the second sheet, click the "Integral as area under curve" tab in the bottom left hand corner of Excel. The calculations are initiated in the same manner as the first sheet.
**Note: Make sure that under Excel-> preferences-> calculation: be sure to select calculate sheets "Manually", to check the box marked "Limit Iteration" and set "Maximum Iterations" = 1. For more information on Excel, reference the Excel tutorial at:

http://shodor.org/tutorials/excel/IntroToExcel

Learning Objectives:

  1. Learn how to use random numbers and the average value of a function to approximate the integral of an unintegratable equation
  2. 2. Learn how to use random number and a unit square to approximate the integral of an unintegratable equation
  3. 3. Understand the Law of Large Numbers and how an increased number of trials yields improved accuracy

Objective 1

To accomplish this objective, students should run the model and study the process through which it calculates the each data point. After understanding the process, the model should be run again with different random variables in order to understand the accuracy of the model. Ask the following questions to guide the students:

  1. What is the domain of the random x values? How is the restricted domain related to the unit circle?
  2. What does the restricted domain do to solve the problem of integrating a circle?
  3. How are the y values calculated in the second column? Can you work backwards to solve for the x value?
  4. The formula for calculating pi is the average value of the function multiplied by four. Why is the average value multiplied by four? What does the number represent before it is quadrupled?
  5. Write the integral that is effectively calculated by this method. Hint: Think of the average value equation.

Objective 2

This objective uses the second sheet in the Excel file, "Integral as area under curve." Students should run the model and again attempt to understand the process through which it calculates each variable. After running the simulation once, a second run will help to understand the accuracy of the model. Ask the following questions to guide the students:

  1. What is the domain of the random x values? Double click on the "x coord" cell to see the equation used.
  2. What is the range of the y values? Double click on the "ycoord" cell to see the equation used. How is the restricted domain and range related to the unit circle and unit square?
  3. How are the points in the circle decided? How are "hits" decided? It may be helpful to plot the x and y values on a unit circle and square.
  4. The equation for calculating the estimate of pi is four times the ratio of "hits" to "throws". What does the ratio of hits to throws represent?
  5. Write the integral that is effectively calculated by this method. Hint: Think of the function found in Objective 1.

Objective 3

Each of the two models uses differing amounts of random numbers in order to calculate pi. The first sheet uses samples of 100, 200, 500, 1000, 2000, and 5000 to calculate pi and the second uses samples of 100, 200, 500, 1000, 2000, 5000, and 10000. The Law of Large Numbers states that with an increased number of trials come increased accuracy in predictions. Therefore, the predictions with an increased number of samples should show a closer estimation to the actual number of pi (~3.14159). Students should study the patterns in the outputs with differing number of samples. Ask the following questions to guide their discovery:

  1. Which estimate of pi is the most accurate (pi~3.14159)? How many samples is the model calculating in this estimate?
  2. Run the simulation several times, noting which estimate is the closest to the actual number of pi. Does the most accurate estimate change? Why?
  3. Is there one number that is more accurate most of the time? If so, is the sample greater than or less than the other estimates?
  4. Switch to the second worksheet, "Integral as area under curve" answer questions 1-3 in response to the second simulation. Are there any changes? Explain
  5. 5. The Law of Large numbers states that with an increased number of trails comes increased accuracy in predictions. Does this make sense? Do you see the Law of Large numbers at work in this model?

Extensions:

  1. Understand the application of Monte Carlo integration in relation to cystic fibrosis studies
  2. Understand the idea of random numbers in relation to the real world

Extension 1:

Have students research the topic of Monte Carlo integration and its application to cystic fibrosis studies. Cystic fibrosis is a genetic disorder as a result of a rare, recessive allele passed on by both parents. A method is used for estimating the age of an allele by collecting data about the frequency and the extent of variation that each allele displays. Monte Carlo integration is used to find the maximum estimate of the age of the allele. Ask the following questions to guide the students

  1. If our model were to be used by the cystic fibrosis researchers, what would the x and y coordinates represent?
  2. What is the main goal of using Monte Carlo integration in cystic fibrosis research?

Extension 2

This extension deals with the random number aspect of the Monte Carlo method. Since random numbers from a random number generator are traditionally not random, the integral could be affected by bias in the computer program that is generating numbers. Give students the following links to research this idea and supply the HotBits Genuine Random Number Generator for comparison. Ask the following question:

  1. What is a pseudorandom number? Would you say that the x-values calculated in the Monte Carlo Integrals model are pseudorandom numbers?
  2. Why can a computer not output truly random numbers? Explain.
  3. Compare the random number generator below to the HotBits number generator. What is the difference between the two? Are the HotBits numbers truly random?

Supplemental Materials:

http://www.shodor.org/refdesk/Resources/Algorithms/RandomNumbers/ http://www.fourmilab.ch/hotbits/ http://www.randomnumbergenerator.com