Tutorial 3: Stochastic Simulations

© 2021 Griffin Chure & Manuel Razo-Mejia. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license


In this tutorial, we will cover the basics of writing stochastic simulations and their application to biological phenomena ranging from the diffusion of molecules to genetic drift in populations.

What is a stochastic simulation?

In science, we are often more interested in the distribution of a set of outcomes rather than a single event. This may be the probability distribution of a molecule diffusing a specific distance as a function of time, the distribution of mRNA molecules per cell produced from a constitutively expressing promoter, or the probability distribution of a model parameter given a collection of data. Stochastic simulations allow us to generate a series of simulations of a system in which one step (such as the direction a molecule will diffuse) is governed by random chance. These simulations often boil down to flipping a coin to dictate if said step will occur or not.

Of course, sitting in your office chair flipping a US quarter over and over again is not how one should do a simulation. To get a sense of the probability distribution of some outcome, we often have to simulate the process thousands of times. This means that we need to know how to make our computers do the heavy lifting.

It's often easy to forget just how powerful modern computers can be. What once required a serious computational cluster only twenty years ago can now be done on a 10 mm thick compartment made of rose-gold colored aluminium. In the following exercise, we will demonstate how you can learn about the behavior of biological systems from the comfort of your laptop in only half a screen of code.

Example I: The Random Walk with single point source

Think of a molecule that moves either left or right with equal step probabilities at each subsequent time point. We can decide whether to walk left or right by flipping a coin and seeing if it comes up 'heads' or 'tails'.

Note that this will change every time that we run the code cell. How do we convert this to a 'heads' and 'tails' readout? We can assume that this is a totally fair coin. This means that the probability of getting "heads" to come up $P_H$ is the same as flipping a "tails" $P_T$ such that $P_H + P_T = 1$. This means that for a fair coin, $P_H = P_T = 0.5$. To convert our coin flips above, we simply have to test if the flip is above or below $0.5$. If it is below, we'll say that the coin was flipped "heads", otherwise, it is "tails".

Now imagine that we wanted to flip the coin one thousand times. Obviously, we shouldn't write out a thousand variables and then loop through them. We could go through a loop for one thousand times and flip a coin at each step or flip one thousand coins at once and store them in an array. In the interest of simplicity, we'll go with option one. Let's flip a coin one thousand times and compute the probability of getting "heads".

In the above code cell, we've also introduced a way to format strings using the %s formatter. We can specify that a value should inserted at that position (%) as a string (s) by providing a tuple of the values after the string in the order they should be inserted prefixed by a magic operator %. Note that these strings are inserted in the order in which they appear in the tuple.

We see that our simulated probability is very close to our imposed $P_H$, but not exactly. This is the nature of stochastic simulations. It's based on repeated random draws. If we were to continue to flip a coin more times, our simulated $P_H$ would get closer and closer to $0.5$. This is why doing many repetitions of stochastic simulations is necessary to generate reliable statistics.

So how do we relate this to diffusion? We'll start at position zero and flip a coin at each time step. If it is less than 0.5, we'll take a step left. Otherwise, we'll take a step to the right. At each time point, we'll keep track of our position and then plot our trajectory.

Notice that at the beginning of our for loop, we specified our range to be from 1 to n_steps. This is because the first entry (index 0) of our position vector is our starting position. Since we update our position at timepoint i based off of where we were at time step i - 1, we have to start at index 1.

Now that we've taken the random walk, let's plot it. We'll take a look at where our molecule was at each time point.

Again, since our steps are based on the generation of random numbers. This trajectory will change every time you run the code. As we discussed earlier, the power of stochastic simulation comes from doing them many times over. Let's write our random walk code again one thousand times and plot all of the traces.

You'll notice that this cell took a little bit longer to run than the previous one. This is because we are doing the simulation a thousand times over! To show the random walks, we'll plot all of the trajectories over each other as thin lines.

Pretty cool! We can look at the distribution of positions at various steps in time by making histograms of the positions of each simulation. Let's take a look at the distribution of positions at $t = 200$ steps.

We see that this qualitatively appears to be Gaussian. If we had to guess, we could say that the mean looks like it is right at about zero. Let's take a look at the distribution of positions at the last time point as well.

Again, this distribution looks somewhat Gaussian with a mean of approximately zero. We can actually compute the mean position from our simulation by iterating through each time step and simply computing the mean. Let's plot the mean at each time point as a red line.

As we will learn in a few weeks, this is exactly what we would expect. While the mean position is zero, the mean squared displacement is not quite so trivial. Let's compute this value and plot it as a function of the number of steps.

That certainly looks like it scales linearly with the number of steps, just as we predicted.

Example II: The Random Walk with reflective boundaries

In the first example we look at a series of unbounded random walkers. We even showed that the mean square displacement of the walkers grows linearly as time progresses. If we were to let the simulation run for much longer times, this trend would continue indefinitely. But we know that the world is finite. Furthermore, for cells that range between 1 µm for a bacteria to 1 mm for a Xenopus frog egg to ≈ 1 m for a long neuron axon, there is a limit of how far molecules can diffuse before running into a boundary.

Our simulations can include such boundaries. For our particular purpose we will consider reflective boundaries--as opposed to absorbing boundaries--such that when a molecule hits the wall, it is simply reflected back to continue its random walk within this bounded region. The trick lies in keeping track of the position of the particle with respect to the boundary, and whenever the trajectory exceeds the boundary, for the specific step in which this happens, we simply multiply the displacement by -1, implementing in this way the reflective nature of the boundary.

Let's work through an example. First we will define the size of the boundary (which we will consider to be $\pm$ the defined size.

Now we can use the exact same code as before for the single walker. The difference being that at each step we will check whether or not the particle went pass the boundary, and if so, we will reflect the trajectory.

And with this simple extra if statement we have implemented our reflective boundary! Let's take a look at the trajectory

We can see that indeed the walker is bounded by the limits that we set. Just for fun let's run multiple trajectories.

and now we are ready to look at the multiple trajectories