The previous post illustrated in a simple way the use of Python for scientific computing purposes. This article extends the previous post by introducing the concept of parallel programming. The previous example will be used, so if you haven’t run that yet, I suggest you do that now. Modifications will be made to the previous code, but before we do that, here is me ranting.

**Background:**

**What is parallel programming?**

There are a number of ways in which you can think about this. Given that this series of posts is aimed at noobs, I will present the concept with very simple examples (some of which may have been brought up before). So on with it.

In very simple terms, there are two general categories in which parallel programming (in our context) can be divided into.

- You have a big problem. You divide it into multiple smaller pieces and simultaneously work on all pieces, thus getting to the results faster.
- You have one process. You can run it a number of times. This may sound stupid but I will explain in a bit, the sort of situations where this may be essential.

Let’s have a look at each of these scenarios.

**One problem divided (Category 1):**

We will not be going into details of this yet. I think there is a lot of ground we need to cover before we can delve into working with these types of problems. Future posts will cover MPI and its usage. Additionally we will look at CPU and GPU architectures, since understanding that is necessary before you can break your problem down. At present I will leave you with one example.

Think of matrix multiplication, types of problems which have these operations at their heart can be parallelized under category 1. Since each operation in matrix multiplication is disconnected from one another, immense speed up can be achieved if problems like these are broken down. The reason why I mentioned this is that numerous libraries exist which can be used without thinking too much about the hardware stuff. Examples of that would be:

to name a few (of the many many) things.

Matrix multiplication (or like) is present in a variety of simulation algorithms that range from predicting weather to biomolecular simulations to basic local alignments in bioinformatics to computer graphics etc etc etc. I will leave this here to be continued in some future post. Moving on to the second category.

**Repeated runs (Category 2):**

So before we delve into this, lets resume the conversation where we left it at earlier. You have one problem, but you want to repeat the program multiple times. You may ask why this is necessary, its seems illogical at first to try and do the same thing over and over again. So let me explain this in more detail.

Assume you toss a coin and can have only two outcomes, Head (H) or a Tail (T). Now let’s consider that you have two bags (Red and Blue) and you have two sets of balls of the same colour. For every coin flip, if you get a head, you put a red ball in the red basket and for every tail you put a blue ball in the blue basket. With this scenario, one can ask, if you continue to do the above when there are 100 balls in the blue basket, how many should there be in the red one? Or vice versa?

The answer to the above question is that if the coin is fair, outcome of each should be 50%. Which roughly means that for every 10 coin flips there should be 5H and 5T. However this is not entirely accurate, because if you conduct a small number of trials, e.g. 10, then the outcomes don’t have to be equally distributed. It is only at very large coin flips that the numbers will start having lesser difference and only become exactly equal when you conduct an “infinite” number of trials. That is correct, an infinite number of trials (you may get away with some instances of lower number of trials but it isn’t always guaranteed), which will take an infinite time to run (or will it??). Infinity is not under discussion here, what is important, in this, is to realise that you can’t spend that much time on this. So alternatively you decide on a tractable number, e.g. 10,000 trials.

In problems like these, when you settle for a number (e.g. 10,000) smaller than the total population (infinite trials) you cannot and shouldn’t rely on one answer. You will have to do multiple attempts, each of 10,000 trials, and look across the answers of each attempt to make a conclusion. Here are some more things you can’t do:

- Predict weather for the whole year looking at the weather of only 3 continuous days.
- Predict the range of heights of all the people on Earth based on heights of your friends.
- Predict the everyday amount of sugar in chai when you get a cup from the local tea stall based on just one sample. Etc.

These are the situations where you would want to do the same thing over and over again. You would want to conduct a random experiment a few times (ideally infinite times) to ensure that the conclusion drawn is not based on a random instance, i.e. the (almost) same answer occurs many times and not just the one time.

Having said this, let’s revisit the example again. Looking at it again, think about what I just said and evaluate how you calculated the value of . We generated random numbers and counted them. While we had to keep our trials limited, we did it only once. Will it happen again i.e. will the value of the way we measure it come out as right again? In the example below we are going to try and redo the problem multiple times to ensure that if it happens again to generate confidence in this method.

The code for this is going to be the same as before, with only minor changes (shown here only for Python2.x).

!/usr/bin/python from __future__ import division import numpy as np import matplotlib.pyplot as pl import multiprocessing def MM(num): np.random.seed(num) trials = 1000000 counts = 0 for i in range(trials): x = np.random.random() y = np.random.random() x2 = x**2 y2 = y**2 x_y = x2 + y2 dxy = np.sqrt(x_y) if dxy <= 1: counts = counts + 1 print "pi was approximated at ::",4*counts/trials, " on " , num return jobs = [] for i in range(4): p = multiprocessing.Process(target=MM, args=(i,)) p.start()

The first thing we are going to do is to wrap the previous code we had into a function MM(). This is for two reasons, 1) because functions are efficient and 2) because we can submit a function numerous times using some libraries.

So let’s look at the code in more detail. Notice the use of the multiprocessing library in line 6. Construction of a function on line 8. Use of a random seed in line 9 (This needs to be inside the function and run every time. Reason: computers use a seed [a number] to start the generation of random numbers [more on this in a later post] and that needs to be done for every calculation requiring random numbers. If we don’t set it to be different every time we calculate the value of , all our answers would be exactly the same [and no that isn’t what we want. We want to repeat an experiment of generating random number, they aren’t random if every trial generates the exact same random numbers]).

The code below this point is the same as in the previous post. The last few line include a for loop which is used to submit the same function 4 times. Why 4? Actually it can be any number. The number actually represents how many runs you want to do. If you want to do 1000 runs, substitute 4 with 1000. But before you do that, remember the the answer will only be printed when 1000 tasks have been completed. Do you have 1000 cores? Probably not, you have something like a quad-core processor which means you have 4 cores. So if you submit anything above 4 tasks as you are doing in the for-loop, you will have to wait for the previous task to complete before the next one is submitted. Notice that while one iteration of the code in the previous post took ~4 seconds, 4 iterations will take slightly more than ~4 seconds and that is related to how tasks are distributed by your OS across your CPU and processing overhead.

So what did we achieve in this exercise:

- We discussed two forms of parallelisms (amongst many).
- We expanded on the type where a task had to be repeated multiple times.
- While the calculation of Pi could simply be put in a for loop and run a 1000 times, that is not good when you have more cores available. Because it will take 1000 x (time for 1 iteration) to complete 1000 iterations. However as demonstrated above you can use the core in your computer and divide these tasks. So if you have 4 cores your 1000 iterations can be sped by more than 3 times (no not four cause the computer has other things to do as well and the world isn’t ideal).

So this concludes our simple attempt at parallelizing a simple simulation. Next time we will focus on generating simple plots with data using this code and the serial code provided in the previous example.

As always if you have questions regarding this post or any of the previous posts, or have your own research problem that you want some help with, please don’t hesitate to get in touch either through comments here or on Facebook or using our detail listed on the Contact Us page. We will try our best to help you out in any way we can.

Please like share and subscribe IDRACK’s Facebook page and group. Talk to your friends about us.