MATLAB Bootcamp: Cytoskeleton Module Manual

Part I. Spreading the Butter - Integrating the Chemical Master Equation.

In part I, we will write a script that uses the chemical master equation to describe the time-evolution of a growing and shrinking cytoskeletal filament. We will use it to predict a steady-state probability distribution of filament lengths.

Part II. Filament Image Analysis.

In part II, we will write a script to measure filament length from images of actin filaments. (For this module, we will not collect our own data.) We can then compare our measured distribution of filament lengths to our prediction from Part I.

Ready?

I. Spreading the Butter - Integrating the Chemical Master Equation

In this script, we will learn a bit about cytoskeletal filaments, how to write a numerical integrator in MATLAB, and the principle of the master equation.

The Chemical Master Equation

We will oftentimes be interested in how probability distributions of some process (such as size, length, level of gene expression, etc) evolve over time. We can capture our understanding of these systems by writing differential equations, which will more often than not be relatively hairy to solve analytically. One way we can write out these differential equations is by writing what is called a master equation. In general, a master equation will have the form

As an example, let's think of how we can compute the probability of having a filament with a length

. There are a total of four things that can happen:

A filament of length grows by one monomer to become .
A filament of length degrades by one length to become .
A filament of length can degrade by one to become .
A filament of length or grow by one monomer to become .

With this knowledge of how we can build a filament, we can write the complete master equation for a filament of length

Rather than doing this by hand for all possible lengths from

, we can write the general master equation as

With this master equation in place, we can now figure out how the distribution changes with time! To do so, we'll use the method of Forward-Euler integration.

Forward-Euler Integration

Developing simple ways to solve ordinary differential equations has long been an area of intense research. While deriving the analytical solution may be simple in some cases, it is often useful to solve them numerically, especially if slamming out the analytical solution will give you carpal tunnel.

While there are many ways to numerically integrate these equations, in this tutorial we will examine the Forward Euler method. Say we have an ordinary differential equation such as

where τ is some time constant and t is time. Rather than solving this analytically (although it is trivial), we can solve it numerically by starting at some given value of x, evaluating this equation for a given time step

, and updating the new value of x at this new time

. WE can phrase this mathematically as

Say our initial value ( x at

) is

and

. e can take a time step

and find that the change in value of x is

WE can then compute the new value of x at time

We can then take another step forward in time and repeat the process for as long as we would like. As the total time we'd like to integrate over becomes large, it becomes obvious why using a computer is a far more attractive approach than scribbling it by hand.

A major point to be wary of is the instability of this method. The error in this scales with the square of our step size. We must choose a sufficiently small step in time such that at most only one computable event must occur. For example, if we are integrating exponential growth of bacterial cells, we don't want to take time steps larger than a cell division! This requirement is known as the Courant-Friedrichs-Lewy condition and is important for many different time-marching computer simulations.

Integrating the Chemical Master Equation

Let's write an integrator to numerically solve this master equation. We'll start by defining the chemical parameters of the system, r and γ.

% Define the parameters. 
r = 9; % on rate of a monomer to the filament. Dimensions are inverse time
gamma = 10; % off rate of a monomer to the filament. Dimensions are inverse time

While the length of our cytoskeletal filament can be essentially infinite, it's not really possible for us to compute that. To get around this we can set what a "maximum length" should be in our simulation integrate over that range. You should note that this means our solution will not be exact, but is good enough for what we are after. For more exact numerical calculations, you should consider looking into MATLAB's standard ODE solvers.

We will be integrating essentially over two dimensions − filament length and time. This means that we will want to set up a two dimensional matrix where each column will represent a time point and each row will be an integer length of our filament. Let's go a head and define this matrix as well as the other parameters of our system.

% Define the integration parameters. 
maxLength = 100; % The maximum length of the filament that we will integrate over. 
totalTime = 50;  % Total time of the integration in dimensions of time. 
dt = 1/ 50; % Time step for the simulation. Note that this needs to be sufficiently small!
timeSteps = totalTime / dt; % The total number of time steps we will be taking. 
% Set up the matrix in which we will store the values. 
P = zeros(maxLength, timeSteps);  % A two-dimensional matrix full of zeros. 

Before we can begin the integration, we have to set the intial condition (our matrix is full of zeros after all!). We could start our integration in any way, but for fun, let's set the intial condition for there to be zero filaments. This means we will set the first index of our matrix

% Set the initial condition. 
P(1, 1) = 1.0; % Remember  that MATLAB indexing begins at 1. 

Now we are finally ready to begin the integration. However, we have to be cognizant of one issue − the boundary conditions. When we have

, for example, it doesn't make sense to have a degradation term

because 0 can't degrade! Likewise, you can't have a production term

. When we begin our integration, we will have to compute the boundary conditions,

and

maxLength individually. Let's go ahead and perform the integration.

% Loop first through each time point. We'll start at time t=2 since we set the 
% initial condition. 
for t=2:timeSteps
    % We'll compute the initial condition, when ell = 1, which is our first condition. 
    ell = 1;
    P(ell, t) = P(ell, t-1) + gamma * dt * P(ell + 1, t-1) - r * dt * P(ell, t - 1);
    
    % Now we can deal with the other boundary condition, when ell = maxLength. 
    ell = maxLength;
    P(ell, t) = P(ell, t-1) + r * dt * P(ell-1, t-1) - gamma * dt * P(ell, t-1);
    
    % The boundary conditions are satisfied, so we can jump ahead and go through the 
    % rest of the lengths of the filament. 
    for ell=2:(maxLength - 1) % -1 so we don't compute the final length!
        P(ell, t) = P(ell, t-1) + r * dt * P(ell-1, t-1)  + gamma * dt * P(ell+1, t-1) - ...
            dt * P(ell, t-1) * (r + gamma);
    end % This ends the length loop. 
end % This ends the time loop. 

And that's it! In just a few lines of code, we were able to integrate our master equation! Let's take a look at the distribution. Since we have three axes of information (length, time, and probability), we can plot it as a 3-D bar plot, where the height of the bars will be representative of the probability of that length, the x-dimension will be length, and the

dimension will be time steps. Since we have so many time steps, we'll only plot every twentieth time point.

% Plot the probability vector as 3-D bar plot.

bar3(P(1:40, 1:70:timeSteps)); % 1:20:timeSteps plots the first bar

% to the last bar taking steps of 20

% Now we just have to add labels like good scientists.

ylabel('length in monomers')

xlabel('time steps x 20')

zlabel('P(l, t)');

% Set a specific angle for viewing.

view([52.900 35.600])

As we rotate this plot around, we can see that it spreads out over time, with the mode being at 0. When we look at the end of the simulation, the distribution does look qualitatively exponential. Rather than saying just what it looks like, we can actually solve for this distribution analytically.

Above, we wrote out the master equation that describes the length of the distribution. We can analytically solve for the steady-state distribution using

We can rewrite this in the form of an ordinary differential equation which summarizes the change in the probability length distribution over time,

At steady state, the change in probability for a given length ℓ over a time step

is 0. By setting the above equation equal to zero, we can solve for this probability distribution,

Now, that's not very informative of what the distribution should be because the distribution we are interested in

is dependent on two other lengths,

and

. We can solve for this distribution

) as a function of the reference length

using the principle of detailed balance.

A Brief Aside: Detailed Balance

The principle of detailed balance states that there is no such thing as an irreversible process and at equilibrium, each elementary elementary process is equilibrated by its reverse process. For example, let's imagine we have an irreversible process of the form

The principle of detailed balance states that at equilibrium, each elementary transition is in equilibrium,

If the forward transition

occurs at a rate

and

occurs with a rate

, we can write an expression for B in steady state in terms of A,

With this principle in hand, we can solve for our length distribution.

Solving for the distribution using detailed balance

A few lines ago, we wrote an expression for

in terms of

and

. We can solve this generally by realizing that throughthe principle of detailed balance,

As an example, we can look at the case of

and solve it in terms of

We can solve for

Great! Now, let's try solving for

. We can write

Using our expression for

derived above, we can rewrite this as

Solving for

yields

Of course, we probably don't want to do this over and over again. A keen eye will notice the pattern that this should simplify to the general form for

in terms of

Now, we are almost there! Ideally, we would have an expression for

that is independent of any other probability, meaning your function should only be a function of r, γ, and ℓ. We can get there by realizing that this probability distribution

(like every other probability distribution) should be normalized,

This normalization condition can be rewritten by subbing in our expression for

above,

Since

is constant for all lengths ℓ in the sum, we can pull this term infront of the sum resulting in

While this may look like a pain to solve at first, we can recognize that the summation is exactly the geometric series, which converges to

For a refresher on the convergence of this seris, scroll down to the very bottom of this tutorial. We can use this convergence to now solve for our desired distribution

Pluggin this result into our expression for

, we arrive at the analytical solution for the steady state distribution,

which is an exponential distribution! To see if our numerical integration matches our analytical solution for the steady state distribution, we can simply plot them on top of each other.

% Set up a length vector

lengths = 0:1:maxLength-1;

% Set up a more finely spaced array of points to compute the analytical solutions.

L = linspace(0, maxLength, 1000);

% Compute the analytical solution.

soln = (r / gamma).^L * ( 1 - (r / gamma));

% Plot the last time point of our integration and the analytical solution.

bar(lengths, P(:, timeSteps), 'DisplayName', 'Numerical Solution')

hold on

% Plot the analytical solution.

plot(L, soln, 'r-', 'LineWidth', 2, 'DisplayName', 'Analytical Solution');

% Of course, always labels.

xlabel('length in monomers')

ylabel('probability')

legend('-DynamicLegend')

% Set the limits from 0 to maxLength -1

xlim([0, maxLength])

hold off

So, we have shown that our analytical solution matches our numerical solution (nearly) exactly.

In conclusion

In this tutorial, we have learned a bit about how to write chemical master equations for biological processes, how to numerically integrate them, and proved the validity of the approach by matching our solution to the analytical solution

Extra Stuff!

Proving the convergence of the geometric series

The geometric series is given as

We can evaluate the first few terms to arrive at

By pulling out one factor of λ, this can be rewritten as

By realizing that the final three terms in the parenthesis are exactly what we defined as S, the series becomes

Performing some simplification, we come to

Now you have a tool in your toolbox for when you run across the geometric series.

Part II. Filament Image Analysis.

We'll start by opening up a single image and segmenting filaments (i.e. dividing the image into filaments and background). Then, we'll figure out how measure length for all our filaments. Afterwards, we'll program a for loop to do the same for all the images and produce graphs of the filament length distribution.

% Get info about all the images.
images = dir('*.tif'); %makes a list of all the .tif files within our current directory 
% and saves it in "images".

Check out "images" in your MatLab workspace. See how it contains useful info, like the names of our images?

We will use this info later to loop over all our images, analyzing them all. But for now, we'll open just one.

% Read one of the images into MatLab.

im = imread([images(1).name]); %opens the first name in the "images" list.

% View the image.

imshow (im);

% Apply a median filter to reduce noise in the image.

imFilt = medfilt2(im); %sets each pixel to the median intensity value

% of the 3x3 pixel box centered there.

% Scale image intensity.

imNorm = mat2gray(imFilt); %scales intensities so min = 0 and max = 1.

Now, let's figure out how to segment our image.

How can we distinguish filaments from background? For one thing, filaments are bright. We could set a threshold intensity that would separate filaments (brighter than threshold) from background (dimmer than threshold). But how do we know what this threshold value should be?

% Look at the image histogram.

imhist(imNorm, 1000); %histogram of image data. 1000 = number of bins.

% Try a threshold

thresh = 0.1; %replace this value with your best guess.

imThresh = imNorm > thresh; %keep only pixels above threshold

imshow (imThresh);

How does this threshold look? No good? Try a new value for "thresh" and repeat.

Once we're happy with our threshold, let's figure out more ways to remove everything but real filaments from our image.

% Clear the border. We can't correctly measure a filament that extends beyond our image.
imClear = imclearborder(imThresh); %supresses structures lighter than their 
% surroundings and touching a border.

We also know the approximate size (area) of an individual filament, and we know that it should be skinny in shape. Can we check each object in our image to see whether it has these properties? If an object doesn't satisfy these criteria, let's throw it out.

% Identify objects that might be filaments, and give them each a label.

imLab = bwlabel(imClear); %labels connected components in a 2D binary image.

imshow(imLab);

Check each object, and only keep those that are skinny (large "eccentricity") and a reasonable size (area).

We will use MatLab's super-useful regionprops function to measure object properties. Google "matlab regionprops" to learn more!

% Use regionprops to find area and eccentricity of a binary array, imLab.
props = regionprops(imLab, 'Area', 'Eccentricity'); 
approvedObj = zeros(size(im)); %we'll put only objects of certain size/shape here.
for i=1:length(props)
    areas(i) = props(i).Area; %pulls areas into a separate vector, areas.
    ecc(i) = props(i).Eccentricity; %pulls eccentricities into separate vector.
    if props(i).Eccentricity > 0.5 && props(i).Area > 100 && props(i).Area < 1000 
        approvedObj = approvedObj + (imLab==i); %keeps only approved objects from imLab.
    end
end
% Let's update our image of labeled objects so that it only contains approved objects.
imLab = bwlabel(approvedObj); %relabel only these approved objects.

We have segmented our image so that it contains the objects (filaments) we want to measure! Hooray! Pretty awesome.

Measuring filament length.

Now, how can we measure the length of the filaments? We can use some MatLab functions to turn out filament objects into "sticks" just one pixel wide. If we do that, then the area of our sticks will be equal to the filament length.

skel = bwmorph(approvedObj, 'skel', Inf); %uses bwmorph to find one-pixel-wide

% skeleton of all objects.

spur = bwmorph(skel, 'spur'); %remove undesired short spurs

% (default for spur = if one pixel has 0 of 4 possible direct neighbors, remove).

imLab = bwlabel(spur); %relabel these skeletons.

imshow(imLab)

% Finally, measure the filament lengths!

% Use regionprops Area to find the area of each object (= length since it's a skeleton!)

props = regionprops(imLab, 'Area');

for i=1:length(props)

sizes(i) = sum(sum(imLab==i));

end

% Make a histogram to see our measured filament lengths.

hist(sizes, 200); %200 is the number of bins.

Loop over all images to measure all filaments.

Now that we've figured out our analysis pipeline, we can apply it to lots of images.

%% Process all of the images. 
sizes = [];
images = dir('*.tif'); %makes a list of the .tif files within our current directory.
for i=1 %:length(images)
    % Read the images
    im = imread([images(i).name]);
    imFilt = medfilt2(im); %median filter
    imNorm = mat2gray(imFilt); %scales intensity between 0 and 1
    %thresh = graythresh(imNorm); %alternatively: find threshold using Otsu's method
    imThresh = imNorm > thresh; %threshold image
    imClear = imclearborder(imThresh); %get rid of stuff touching border
    
    % Label objects and apply filters based on area and eccentricity.
    imLab = bwlabel(imClear);
    props = regionprops(imLab, 'Area', 'Eccentricity');
    for j=1:length(props)
        if props(j).Area > 100 && props(j).Area < 1000 && props(j).Eccentricity > 0.5
            skel = bwmorph(imLab==j, 'skel', Inf);
            spur = bwmorph(skel, 'spur');
            size = sum(sum(spur));
            sizes = [sizes size]; %this keeps adding on one more filament size 
            % to the end of our 'sizes' array.      
        end
    end
end