Building on the previous post, which covered a very simple example using a for loop, this post will take things up a notch. As discussed before, there are many resources available on the internet for learning Python. This blog doesn’t attempt to teach you Python. Instead it provides exposure on using the Python programming language in a scientific context. To this end, I am going to put together a concise Pythonic example which would help you calculate the value of Pi () ( yes the number approximated by the fraction or rounded off to 3.14 ). But as always before that some background (or ranting).
Did you know that there is something called the Pi day? I didn’t. I found out about it from my significant other while writing about this post.
The quantity referred to as can be calculated as the ratio of the circumference of a circle to its diameter. is an irrational number, which means that it cannot be written as , as taught in schools. The fraction is known to exceed i.e. however the rational number works well in many situations because mostly we round of as being 3.14.
Now that we know a little bit about Pi let us perform a computational experiment to determine its value.
Computational Experiment: Ingredients and method
For this experiment you will need the following ingredients.
- A computer.
- A Python environment where you can run some code, see previous post to set this up or read till end.
- and most importantly you will need patience.
The first two ingredients are understandable, but you may ask why did I specifically mention patience.
To answer that question we will have to look at statistics.
Because the experiment that we are about to conduct is based in statistics (well sort of), I will introduce the concept now with an analogy (or rant).
Let us assume that in a room in your university or office, there are 50 people and you are best friends with five of them. You know everything about your best friends. I am just going to use their heights for reference here so no need to get creative. So you know their heights and let us suppose that the heights are 5 feet (ft) 4 inches (in), 5 ft 3 in, 5 ft 9 in, 5 ft 6 in, 5 ft 5 in. What you have here is a sample from a larger population of 50 people in the room. If you were to tell me the average height of people in the room, you can do the following two things.
1) You can be lazy and just use the five heights you already know. Sum them up and divide the number by 5 to get an average (which would be 5 ft 5 something inches) and say that it represents the average of the total population in the room or
2) you can be proactive. Talk to everyone in the room and get their heights. Then sum up the 50 heights and divide them by 50 to get an average. This average will be a true representation of the population of your interest.
Keeping the above example in mind, in almost every case the population is so big that the true height cannot be correctly averaged. Instead, what is usually practised is that a very large number of samples are drawn from the total population. This large number is where the patience comes in. Think about it. The more samples you take, the closer your average will be to the true average. In the example of the best friend case, the more people you know the heights of, the better your average will be, e.g. knowing height of six people is better than five and similarly knowing height of 20 people is better than six. So in short, larger is better and therefore requires patience.
In the context of the experiment we are about to perform we will be doing random sampling of something. The more random samples you will use, the better your results would be. That is the use of patience in this experiments and we will discuss this further in a moment.
So now that you know the ingredients, it’s time to learn the method we will be using. This experiment is based on a Monte Carlo method. I will not go to much into the mathematics of it, but just add here that this is a method which has stochasticity (randomness) at its heart.
Computational Experiment: Theoretical framework
Having covered the basics let’s have a look the theory of the experiment itself and figure out what we can do to get to the value of .
We all know about squares and circles. In case some revision is needed, squares are four sided geometric figures whereas circles have no sides or edges, but all points that lie on the line are equidistant from the centre of the circle. I think I am making a mess of these definitions have a look at Wiki pages for better definitions.
Now if we were to take a circle and a square such that the diameter of the circle is the same as the length of the side of a square we can have a situation like the one shown in Figure 1.
Caution: Some math ahead
The area of the circle is given by . If we set the experiment such that the side of the square is equal to 1 unit, than the circle that fits it will have a diameter of 1 unit and hence a radius of 0.5 unit. In this case the area of the circle would be given by or simplified to:
and the area of the same square which the circle is fitted to is which in this case would be 1 unit (squared).
OK great, so we know the areas of the two figures. If we were to divide the area of the circle with that of the square we will end up with
Now comes a logical jump. If we distribute points uniformly within the square (Figure 1), some of them would be inside the circle and some of them would be outside. Right? If we count the number of points inside the circle and divide them by the number of points in total that should be (approximately) equal to the ratio of the areas. Here is the same thing I just said in maths.
So after that logical jump let us do this experiment and find these points.
Python code for the Monte Carlo experiment to calculate the value of Pi:
Before we write any type of code for any cause it is always good practice to try and write an algorithm for it.
Interesting fact: The word algorithm is based on the name of a Al-Khwarizmi, a notable Persian scientist from the House of wisdom (stopping here. Perhaps Muslim science or Biology decoded can write or have written about this. Do look them up here and here).
Writing an algorithm for any problem helps break down a large problem to smaller ones which we can handle easily. So let’s get to it.
- Generate a point at random in 2D space (i.e. with an abscissa and an ordinate or more simply an x and y coordinate). [ Note: For computational simplicity we will place the circle at origin, such that only one quarter of Figure 1 is the first quadrant. The length of the side of the square will be 2 units such that 1 unit is on the right of the origin. See Figure 2 for an illustration]
- We will measure the distance of the point generated from the origin to see if it’s lesser than or larger than 1. [Note that the arc marks the boundary and all points on it or inside it would have a distance equal to or lesser than 1].
- We will note all distances smaller than 1.
- We will repeat steps 1 till 3, N number of times, where N depends on how patient we are.
- We will print out the answer.
While the above breakdown was intended to be an algorithm, it is not how you will write it in a proper way. See the algorithm page for more on semantics. Ok enough said now. Let me bring out the code.
Unlike the previous post this time we will make use of libraries. See code below. Underneath the code is a line by line (almost) explanation of it.
from __future__ import division import numpy as np import matplotlib.pyplot as pl trials = 100 counts = 0 for i in range(trials): x = np.random.random() y = np.random.random() x2 = x**2 y2 = y**2 x_y = x2 + y2 dxy = np.sqrt(x_y) if dxy <= 1: counts = counts + 1 print "pi was approximated at ::",4*counts/trials
- from __future__ import division
You can ignore this line in Python 3.x. In Python 2.7 it’s helpful for division. Sorry will cover this some other time (or you can just ask me)
- import numpy as np
import matplotlib.pyplot as pl
These two line are telling python to import 2 libraries. Numpy and Matplotlib and after importing them alias them to keywords like np and pl for shorthand access. The first library we will need to generate random points. The second library is for plotting figures which we will as well (or we might not)
- trials = 100
counts = 0
We create two variables. Trials is the N in the pseudo-algorithm above. Counts is a variable we will set to 0 and increment it by 1 every time we find a value inside the circle.
- for i in range(trials): # Run a for loop “trials” time.
x = np.random.random() # Generate a random x-coordinate.
y = np.random.random() # Generate a random x-coordinate.
dxy = np.sqrt(x**2 + y**2) # Distance formulae
if dxy <= 1: # If point is on or inside the circle distance
counts = counts + 1 # will be lesser than 1 so record it.
print “pi was approximated at ::”,4*counts/trials #This will print the value of .
Note that we used
Which is why we cross multiply by 4 to get the value of . [Note that even though we are using information from the first quadrant only, if we had used the whole system it would have solved to the same number. This way just allows us to be smart about the program and write it in lesser number of lines].
So in order to run this code, if you don’t have Python, I will recommend going through the previous post. If you are too lazy to do that click here to use an online resource for Python programming. For instructions on using this online resource refer to the previous post.
If you use the Jupyter notebook that you will be using Python 3 so the code will have to change a bit (only the last line i.e. the print statement). See below for the code if you use Python 3.
from __future__ import division import numpy as np import matplotlib.pyplot as pl trials = 100 counts = 0 for i in range(trials): x = np.random.random() y = np.random.random() x2 = x**2 y2 = y**2 x_y = x2 + y2 dxy = np.sqrt(x_y) if dxy <= 1: counts = counts + 1 print ("pi was approximated at ::",4*counts/trials)
Play around with the value of “trials“. Make them large (don’t go above 1000000, no one has that much patience :D). Notice that as the number of trials increases the value of gets closer to 3.14 .. . If you want to calculate the true value of you will need to run infinite trials (Don’t ask me how, cause that can’t be done).
So in this exercise – we used a Monte Carlo method to approximate the value of . We also did some hacky math to figure out how to do it, and designed a framework to carry out that hacky math. In all of this we also learned about statistical sampling. Remember that when you set the value of trials to a smaller number you are effectively looking at a small number of samples and hence the value is inaccurate. Increasing the number increases the samples and hence approximates a better value. Although we imported a plotting library, we didn’t do any plotting in this exercise. I was too lazy to put that in this post. I will do that next time using this example. Congratulations on running your first simulation (if you haven’t ever simulated before). More will come.
Please like share and subscribe IDRACK’s Facebook page and group. Talk to your friends about us and as said before if you have any questions and problems following all this, please feel free to get in touch, either in comments below or Facebook or through details on our Contact Us page.