Supplemental Information C - Single-Cell Microscopy

(c) 2017 the authors. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

In [72]:
# For scientific computing
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import glob

# Custom written modules
import mwc_induction_utils as mwc

# Image processing functions
import skimage.io
import skimage.filters
import skimage.morphology
import skimage.segmentation
import scipy.ndimage

# For parallelization of arduous median filtering operations
import joblib as jlb

# Set the plotting environment.
%matplotlib inline
sns.set_context('notebook')
sns.set_style('dark')
mwc.set_plotting_style()

Thus far, all data has been collected on a MiltenyiBiotec MACSQuant flow cytometer. While this provides an incredibly high throughput method for making single-cell measurments, much information is lost as each cell only gives two real values $--$ scattering and integrated intensity. To confirm that these measurements are comparable to previous work from the lab (such as Brewster et al., 2014) we have taken several measurements using a high-sensitivity laser illumination epifluorescence microscope. In this notebook, the procedure for producing even illumination, segmenting cells, applying area and eccentricity bounds, and extracting fluorescence information will be described.

Generating a flat-field image

Our microscopy set up is a Nikon TI-Eclipse inverted epifluoresence microscope outfitted with YFP (514nm) and TRITC (650nm) laser lines. The emission filter for the YFP channel are designed for on-peak emission of YFP from a laser-line source and have a remarkably flat dichroic mirror. The TRITC cube is designed for Hg lamp excitation, but works well for our laser system. Since we are not extracting quantitative information from the TRITC channel and are only using it for segmentation, we are less concerned with efficiency of detection in this channel.

However, we must correct for uneven illumination in the YFP channel since this happens to be the channel from which we would like to quantify. The 514nm laser beam is expanded such that the entire camera chip is illuminated. However, this results in an uneven 2D illumination profile across the field of view during YFP excitation in the profile of a two dimensional gaussian. This means that cells that happen to be closer to the edge of the field of view will be dimmer than those in the center, even if their mean fluorescence is identical. To correct for this, we manipulate each individual image such that it is flat in illumination across the entire sample. We perform this correction by imaging a homogenously fluorescent plastic slide (Autofluorescent Plastic Slides, Chroma cat. no. 92001) in the YFP channel approximately 20 times and create an average intensity image producing an average image of the excitation illumination pattern. We then correct for uneven illumination by renormalizing each image as

$$ \begin{align} I_\mathrm{flat} = \frac{I - I_\mathrm{dark}}{I_\mathrm{YFP} - I_\mathrm{dark}}\langle I_\mathrm{YFP} - I_\mathrm{dark}\rangle, \end{align} $$

in which $I_\mathrm{flat}$ is the flat-field corrected image, $I_\mathrm{YFP}$ is the average image of the illumination field and $I_\mathrm{dark}$ is an average image with no illumination and represents the camera shot noise. Prior to this correction, each image is median filtered with a 3x3 square structuring element to help correct for signal noise. We define the functions used for this correction below.

In [10]:
def average_stack(im, median_filt=True):
    """
    Computes an average image from a provided array of images.

    Parameters
    ----------
    im : list or arrays of 2d-arrays
        Stack of images to be filtered.
    median_filt : bool
        If True, each image will be median filtered before averaging.
        Median filtering is performed using a 5x5 square structural element.

    Returns
    -------
    im_avg : 2d-array
        averaged image with a type of int.
    """
    # Determine if the images should be median filtered.
    if median_filt is True:
        selem = skimage.morphology.square(3)
        im_filt = [scipy.ndimage.median_filter(i, footprint=selem) for i in im]
        
    else:
        im = im_filt

    # Generate and empty image to store the averaged image.
    im_avg = np.zeros_like(im[0]).astype(int)
    for i in im:
        im_avg += i
    im_avg = im_avg / len(im)
    return im_avg

def generate_flatfield(im, im_dark, im_field, median_filt=True):
    """
    Corrects illumination of a given image using a dark image and an image of
    the flat illumination.

    Parameters
    ----------
    im : 2d-array
        Image to be flattened.
    im_dark : 2d-array
        Average image of camera shot noise (no illumination).
    im_field: 2d-array
        Average image of fluorescence illumination.
    median_filt : bool
        If True, the image to be corrected will be median filtered with a
        3x3 square structural element. 
        
    Returns
    -------
    im_flat : 2d-array
        Image corrected for uneven fluorescence illumination. This is performed
        as

        im_flat = ((im - im_dark) / (im_field - im_dark)) *
                   mean(im_field - im_dark)

    Raises
    ------
    RuntimeError
        Thrown if bright image and dark image are approximately equal. This
        will result in a division by zero.
    """

    # Ensure that the same image is not being provided as the bright and dark.
    if np.isclose(im_field, im_dark).all():
        raise RuntimeError('im_bright and im_dark are approximately equal.')
        
    # Compute the mean difference between the bright and dark image.
    mean_diff = np.mean(im_field - im_dark)
    
    if median_filt == True:
        selem = skimage.morphology.square(3) 
        im_filt = scipy.ndimage.median_filter(im, footprint=selem)
    else:
        im_filt = im

    # Compute and return the flattened image.
    im_flat = ((im_filt - im_dark) / (im_field - im_dark)) * mean_diff
    return im_flat

def ome_split(im):
    """Splits an ome.tiff image into individual channels"""
    if len(np.shape(im)) == 2:
        raise RuntimeError('provided image is a single channel')
    ims = []
    for i in range(np.shape(im)[-1]):
        ims.append(im[:, :, i])
    return ims

To demonstrate this correction, we will use a representative experimental run.

In [24]:
# Define the data directory 
data_dir = '../../data/20161018/'

# Load the dark and field images. 
dark_glob = glob.glob(data_dir + '*camera_noise*/*tif')
field_glob = glob.glob(data_dir + '*YFP_profile*/*tif')
dark_ims = skimage.io.ImageCollection(dark_glob, conserve_memory=False)
field_ims = skimage.io.ImageCollection(field_glob, conserve_memory=False)

We can begin by looking at a representative image of the field and dark images.

In [27]:
fig, ax = plt.subplots(1, 2, figsize=(9, 5))
ax[0].imshow(field_ims[0], cmap=plt.cm.viridis)
ax[0].set_title('illumination profile', fontsize=16)
ax[1].imshow(dark_ims[0], cmap=plt.cm.viridis)
ax[1].set_title('camera noise', fontsize=16)
Out[27]:
<matplotlib.text.Text at 0x11b40dc88>

We can see that the pixel values certainly do appear to be higher in the middle of the illumination profile while there is a bunch of shot noise in the camera image. We will now generage average images of each.

In [28]:
# Generate averages of the fluorescence images.
average_field = average_stack(field_ims)
average_dark = average_stack(dark_ims)

# Show the averages.
fig, ax = plt.subplots(1, 2)
ax[0].imshow(average_field, cmap=plt.cm.viridis)
ax[1].imshow(average_dark, cmap=plt.cm.viridis)
ax[0].set_title('average illumination')
ax[1].set_title('average noise')
/Users/gchure/anaconda/lib/python3.4/site-packages/skimage/external/tifffile/tifffile.py:1106: UserWarning: ome-xml: not an ome-tiff master file
  warnings.warn("ome-xml: not an ome-tiff master file")
Out[28]:
<matplotlib.text.Text at 0x105888d30>

Now that we have generated the average illumination profile and the average noise pattern, we can correct one of our images as an example.

In [29]:
# Load an example image.
im = glob.glob(data_dir + '20161018_wt_O2_auto_0uMIPTG_1/*tif')
ex_im = skimage.io.imread(im[0])
phase, rfp_im, yfp_im =  ome_split(ex_im)

# Correct for uneven illumination.
selem = skimage.morphology.square(3)
yfp_filt = scipy.ndimage.median_filter(yfp_im, footprint=selem)
yfp_flat = generate_flatfield(yfp_filt, average_dark, average_field,
                              median_filt=False)


# Show both.
fig, ax = plt.subplots(1, 2, figsize=(9, 5))
ax[0].imshow(yfp_filt, cmap=plt.cm.viridis)
ax[0].set_title('median filtered image')
ax[1].imshow(yfp_flat, cmap=plt.cm.viridis)
ax[1].set_title('flatfield image')
Out[29]:
<matplotlib.text.Text at 0x11ba67f98>

We can prove that this corrects the uneven illumination by taking a profile across each dimension of the image before and after filtering.

In [70]:
# Plot the profile and compute the average for the filtered and unfiltered case.
fig, ax = plt.subplots(2, 2, figsize=(9, 9), sharex=True, sharey=True)
avg_prof_x = []
avg_prof_y = []
im_shape = np.shape(yfp_filt)
x_range = np.arange(0, im_shape[1], 1)
y_range = np.arange(0, im_shape[0], 1)

# Generate averages
filt_avg_y, flat_avg_y = [], []
filt_avg_x, flat_avg_x = [], []
for i in range(im_shape[0]):
    filt_avg_y.append(np.mean(yfp_filt[i, :]))
    flat_avg_y.append(np.mean(yfp_flat[i, :]))
    ax[0, 1].plot(y_range, yfp_filt[i, :], 'k-', linewidth=0.1, alpha=0.2)
    ax[1, 1].plot(y_range, yfp_flat[i, :], 'k-', linewidth=0.1, alpha=0.2)
for i in range(im_shape[1]):
    filt_avg_x.append(np.mean(yfp_filt[:, i]))
    flat_avg_x.append(np.mean(yfp_flat[:, i]))
    if i == 1:
        ax[0, 0].plot(x_range, yfp_filt[:, i], 'k-',
                      linewidth=1, alpha=0.2, label='single profile')
    else:
        ax[0, 0].plot(x_range, yfp_filt[:, i], 'k-', linewidth=0.1, alpha=0.2)
    ax[1, 0].plot(x_range, yfp_flat[:, i], 'k-', linewidth=0.1, alpha=0.2)
    
# Now plot the averages over each.
ax[0, 0].plot(x_range, filt_avg_x, 'r-', label='average')
ax[0, 0].legend(loc='upper right')
ax[0, 1].plot(y_range, filt_avg_y, 'r-')
ax[1, 0].plot(x_range, flat_avg_x, 'r-')
ax[1, 1].plot(x_range, flat_avg_y, 'r-')

# Label things so we know what is what
ax[0, 0].set_ylabel('filtered image')
ax[1, 0].set_ylabel('flatfield image')
ax[1, 0].set_xlabel('x coordinate')
ax[1, 1].set_xlabel('y coordinate')

axes = ax.ravel()
for a in axes:
    a.set_xlim([0, 512])

The effect is subtle, but it is important to correct for when making quantitative fluorescence measurements from images.

Segmentation

To make segmentation of the bacterial cells much easier, we performed all experiments with cells containing a pZS4*5 plasmid which constitutively expresses mCherry. As we knoew we would want to compare miccroscopy and flow-cytometry methods, we used this same construct for all flow measurments.

To segment in the fluorescence channel, we use the Marr-Hildreth algorithm(also known as the Laplacian of Gaussian) for edge detection. We invite the interested reader to see the original paper by Marr and Hildreth for a detailed explanation of the algorthim, but it can be summarized as follows.

  1. Apply a Gaussian blur with a small $\sigma$ on the image.
  2. Compute the spatial second derivatives of the image using the Laplacian operator.
  3. Identify regions in which the values of the second derivative cross zero. Each region is specified by a small structural element. These regions are the object edges.
In [31]:
# #################
def find_zero_crossings(im, selem, thresh):
    """
    This  function computes the gradients in pixel values of an image after
    applying a sobel filter to a given image. This  function is later used in
    the Laplacian of Gaussian cell segmenter (log_segmentation) function. The
    arguments are as follows.

    Parameters
    ----------
    im : 2d-array
        Image to be filtered.
    selem : 2d-array, bool
        Structural element used to compute gradients.
    thresh :  float
        Threshold to define gradients.

    Returns
    -------
    zero_cross : 2d-array
        Image with identified zero-crossings.

    Notes
    -----
    This function as well as `log_segmentation` were written by Justin Bois.
    http://bebi103.caltech.edu/
    """

    # apply a maximum and minimum filter to the image.
    im_max = scipy.ndimage.filters.maximum_filter(im, footprint=selem)
    im_min = scipy.ndimage.filters.minimum_filter(im, footprint=selem)

    # Compute the gradients using a sobel filter.
    im_filt = skimage.filters.sobel(im)

    # Find the zero crossings.
    zero_cross = (((im >= 0) & (im_min < 0)) | ((im <= 0) & (im_max > 0)))\
        & (im_filt >= thresh)

    return zero_cross


# #################
def log_segmentation(im, selem='default', thresh=0.0001, radius=2.0,
                     median_filt=True, clear_border=True, label=False):
    """
    This function computes the Laplacian of a gaussian filtered image and
    detects object edges as regions which cross zero in the derivative.

    Parameters
    ----------
    im :  2d-array
        Image to be processed. Must be a single channel image.
    selem : 2d-array, bool
        Structural element for identifying zero crossings. Default value is
        a 2x2 pixel square.
    radius : float
        Radius for gaussian filter prior to computation of derivatives.
    median_filt : bool
        If True, the input image will be median filtered with a 3x3 structural
        element prior to segmentation.
    selem : 2d-array, bool
        Structural element to be applied for laplacian calculation.
    thresh : float
        Threshold past which
    clear_border : bool
        If True, segmented objects touching the border will be removed.
        Default is True.
    label : bool
        If True, segmented objecs will be labeled. Default is False.

    Returns
    -------
    im_final : 2d-array
        Final segmentation mask. If label==True, the output will be a integer
        labeled image. If label==False, the output will be a bool.

    Notes
    -----
    We thank Justin Bois in his help writing this function.
    https://bebi103.caltech.edu
    """

    # Test that the provided image is only 2-d.
    if len(np.shape(im)) > 2:
        raise ValueError('image must be a single channel!')

    # Determine if the image should be median filtered.
    if median_filt == True:
        selem = skimage.morphology.square(3)
        im_filt = scipy.ndimage.median_filter(im, footprint=selem)
    else:
        im_filt = im
    # Ensure that the provided image is a float.
    if np.max(im) > 1.0:
        im_float = skimage.img_as_float(im_filt)
    else:
        im_float = im_filt

    # Compute the LoG filter of the image.
    im_LoG = scipy.ndimage.filters.gaussian_laplace(im_float, radius)

    # Define the structural element.
    if selem is 'default':
        selem = skimage.morphology.square(3)

    # Using find_zero_crossings, identify the edges of objects.
    edges = find_zero_crossings(im_LoG, selem, thresh)

    # Skeletonize the edges to a line with a single pixel width.
    skel_im = skimage.morphology.skeletonize(edges)

    # Fill the holes to generate binary image.
    im_fill = scipy.ndimage.morphology.binary_fill_holes(skel_im)

    # Remove small objects and objects touching border.
    im_final = skimage.morphology.remove_small_objects(im_fill)
    if clear_border is True:
        im_final = skimage.segmentation.clear_border(im_final, buffer_size=5)

    # Determine if the objects should be labeled.
    if label is True:
        im_final = skimage.measure.label(im_final)

    # Return the labeled image.
    return im_final

Both of these functions were adapted from those used in an excellent data analysis course taught by Justin Bois. We will now apply this algorithm to an example mCherry image and show the segmented cells colored in cyan.

In [32]:
# LoG segment the mCherry channel of our example image.
rfp_seg = log_segmentation(rfp_im)

# Make a merge of the segmentation mask.
phase_float = (phase - phase.min()) / (phase.max() - phase.min())
phase_float_copy = np.copy(phase_float)
phase_float_copy[rfp_seg] = 0.75
merge = np.dstack((phase_float, phase_float_copy, phase_float_copy))

# Show the merge of the segmentation mask.
plt.figure(figsize=(9, 9))
plt.imshow(merge, interpolation='nearest')
Out[32]:
<matplotlib.image.AxesImage at 0x1231cc668>

We see that this segmentation procedure functions quite nicely and faithfully identifies only bacterial objects. However, this fails in some ways when cells are very close to each other. We can remove these cells by filtering objects with respect to area and eccentricity.

Extracting cell properties

Rather than spending more time on improving the segmentation algorithm, We can refine our cellular objects after segmentation by extracting area information as well as fluorescence intensity measurements. We define a function to perform measurement extraction below.

In [44]:
def props_to_df(mask, physical_distance=1, intensity_image=None):
    """
    Converts the output of skimage.measure.regionprops to a nicely
    formatted pandas DataFrame.
    
    Parameters
    ----------
    mask : 2d-array, int
        Segmentation mask containing objects to be measured.
    physical_distance : int or float
        Interpixel distance of the image. This will be used to 
        convert the area measurements to meaningful units.
    intensity_image : 2d-array
        Intensity image for intensity based measurements. If none is 
        provided, only region based measurements will be returned.
        
    Returns
    -------
    df : pandas DataFrame
        Tidy DataFrame containing all measurements.
    
    """
    
    # Ensure that there is at least one object in the image. 
    if np.max(mask) == 0:
        raise ValueError('no objects found in image.')
        
    # Define the values that are to be extracted.
    REGIONPROPS = ('area', 'eccentricity', 'solidity',
                   'mean_intensity')
 
    if intensity_image is None:
        measurements = REGIONPROPS[:-3]
    else:
        measurements = REGIONPROPS
        
    # Iterate through and extract the props.
    props = skimage.measure.regionprops(mask,
                                        intensity_image=intensity_image)
    for i, p in enumerate(props):
        extracted = []
        for val in measurements:
            extracted.append(p[val])

        if i == 0:
            df = pd.DataFrame(extracted).T
        else:
            df2 = pd.DataFrame(extracted).T
            df = df.append(df2)
    df.columns = measurements
    df['area'] = df['area'] * physical_distance**2
    return df

We can now use this function to extract measurements from each cell and generate a tidy dataframe.

In [49]:
# Define the names of the strains and IPTG concentrations.
strains = ['auto', 'delta', 'RBS1027']
iptg_range = [0, 0.1, 5, 10, 25, 50, 75, 
              100, 250, 500, 1000, 5000]
date = 20161018
username = 'gchure'
operator = 'O2'
binding_energy = -13.9
repressors = [0, 0, 130]
ip_dist = 0.160  # In units of µm per pixel

# Iterate through each strain and concentration to make the dataframes.
dfs = []
for i, st in enumerate(strains):
    for j, iptg in enumerate(iptg_range):
        # Load the images.  
        images = glob.glob(data_dir + '*' + st + '*_' + str(iptg) +
                            'uMIPTG*/*.ome.tif')
        ims = skimage.io.ImageCollection(images)
        for _, x in enumerate(ims):
            _, m, y = ome_split(x)
            y_flat = generate_flatfield(y, average_dark, average_field)
            
            # Segment the mCherry channel.
            m_seg = log_segmentation(m, label=True)
            
            # Extract the measurements.
            im_df = props_to_df(m_seg, physical_distance=ip_dist,
                                intensity_image=y_flat)
            
            # Add strain and  IPTG concentration information.
            im_df.insert(0, 'IPTG_uM', iptg)
            im_df.insert(0, 'repressors', repressors[i])
            im_df.insert(0, 'rbs', st)
            im_df.insert(0, 'binding_energy', binding_energy)
            im_df.insert(0, 'operator', operator)
            im_df.insert(0, 'username', username)
            im_df.insert(0, 'date', date)
            
            # Append the dataframe to the global list.
            dfs.append(im_df)
            
# Concatenate the dataframe
final_df = pd.concat(dfs, axis=0)

Before we look at the data, we can make some assumptions about the size and shape of the segmented cells based off of our knowledge of E. coli. The standard E. coli cell is approximately $2~\mu$m long by $1~\mu$m wide. Projected onto a two dimensional surface, this yields a cell with an area of approximately 2 $\mu$m^2. We know that a cluster of three cells will give an area of approximately $6~\mu^2$, which will serve as an upper bound on our area threshold. For a lower bound, we can make the assumption that our smallest cell will be one-quarter the size of a 'normal' cell, yielding a value of $0.5~\mu$m^2.

We can improve our segmentation by also considering the shape of the segmented objects. E. coli cells can be simplified as a ellipse when projected onto a two dimensional plane. The eccentricity of an object provides a measure of how circuluar or ellipsoidal an object is. The eccentricity can be defined mathematically as

$$ \text{eccentricity} = \sqrt{1 - \left({b \over a}\right)^2} $$

where $a$ and $b$ are the semimajor and semiminor axes of the ellipse respectively. Using the dimensions of a hypothetical E. coli cell described above, we can assume a lower-bound for the eccentricity be be approximately 0.8.

Now that we've measured the properties of these cells, we can examine the distribution and apply these bounds. Rather than looking at a histogram of each property, we can look at the empiricial cumulative distribution function (ECDF) of the data which avoids the biases imposed by binning. We define a function for this below.

In [50]:
def ecdf(data):
    """
    Computes the empirical cumulative distribution function (ECDF)
    of a given set of 1D data.
    
    Parameters
    ----------
    data : 1d-array
        Data from which the ECDF will be computed.
    
    Returns
    -------
    x, y : 1d-arrays
        The sorted data (x) and the ECDF (y) of the data.
    """
    
    return np.sort(data), np.arange(len(data))/len(data)
In [67]:
# Look at the area and eccentricity of segmented objects. 
area_x,area_cdf = ecdf(final_df['area'])
ecc_x, ecc_cdf = ecdf(final_df['eccentricity'])
fig, ax = plt.subplots(1, 2, figsize=(9, 5))
ax[0].plot(area_x, area_cdf, 'k.', alpha=0.4)
ax[0].set_xlabel('area (µm$^2$)')
ax[0].set_ylabel('ECDF')
ax[1].plot(ecc_x, ecc_cdf, 'k.', alpha=0.4)
ax[1].set_xlabel('eccentricity')

# Show the threshold of the bounds highlighted in cyan.
ax[0].fill_betweenx(np.linspace(0, 1, 500), 0.5, 6, color='b', alpha=0.5)
ax[1].fill_betweenx(np.linspace(0, 1, 500), 0.8, 1, color='b', alpha=0.5)
for a in ax:
    a.margins(0.02)
plt.tight_layout()

In the above plot, we've highlighted the 'approved' objects in a blue rectangle. Any object within both of these rectangles will be counted as single cells.

In [68]:
# Threshold the cells by area and ecc.
area_bounds = (1, 6)
ecc_bound = 0.8
cells = final_df[(final_df['area'] > area_bounds[0]) &
                 (final_df['area'] < area_bounds[1]) &
                 (final_df['eccentricity'] >= ecc_bound)]

Examining the intensity measurements

We can examine the fluorescence distributions of the cells as a function of IPTG. We would predict that the autofluorescence and unrepressed samples would have a constant distribution of fluorescence while the repressed strains would broaden and have an increasing mean as a function of IPTG.

In [69]:
# Group the data frame by strain and IPTG concentration and compute the ECDF.
grouped = pd.groupby(cells, ['rbs', 'IPTG_uM']).mean_intensity.apply(ecdf)
colors = sns.color_palette('PuBu', n_colors=len(iptg_range))
# plot the trajectories.
fig, ax = plt.subplots(3, 1, sharex=True)
for i, st in enumerate(strains):
    samp = grouped[st].values
    if i==0:
        labels = iptg_range
    else:
        labels = len(iptg_range) * [None]
    for j, _ in enumerate(iptg_range): 
        ax[i].plot(samp[j][0], samp[j][1], '.', color=colors[j],
                   label=labels[j])
    
# Format the plot.
for i, a in enumerate(ax):
    a.margins(0.02)
    a.set_title(strains[i])
    a.set_xscale('log')
ax[1].set_ylabel('ECDF')
ax[2].set_xlabel('mean pixel intensity (A.U.)')
ax[0].legend(title='IPTG (µM)', bbox_to_anchor=(1.18, 1))
plt.tight_layout()

As expected, the autofluorescence and no repressor samples appear to be constant (with some noise and one apparent outlier) whereas the the $R=260$ fluorescence distribution shifts rightwards with increasing IPTG concentration. With this information, we can now calculate fold-change.

Calculating fold-change

With an array of single-cell intensity measurements, we are ready to calculate fold-change. As is discussed in the supplementary information, we can experimentally calculate the fold-change in gene expression as $$ \text{fold-change}={\langle I_{R > 0} \rangle - \langle I_\text{auto} \rangle \over \langle I_{R = 0} \rangle - \langle I_\text{auto} \rangle }, $$ where $\langle I_{R > 0} \rangle$ is the mean cell intensity in the presence of repressor, $\langle I_\text{auto} \rangle$ is the mean autofluorescence value, and $\langle I_{R = } \rangle$ is the mean cell intensity in teh absence of repressor. From our experimental measurements in a tidy data format, this value becomes trivial to compute.

In [54]:
# Group the dataframe by IPTG concentration then strain.
fc_grouped = pd.groupby(cells, 'IPTG_uM')
fc_exp = []
for group, data in fc_grouped:
    rbs_grp = pd.groupby(data, 'rbs').mean_intensity.mean()
    fc_exp.append((rbs_grp['RBS1027'] - rbs_grp['auto']) / \
         (rbs_grp['delta'] - rbs_grp['auto']))
    

We can now plot the fold-change as a function of inducer concentration.

In [63]:
# Plot the experimental data. 
plt.plot(np.array(iptg_range)/1E6, fc_exp, 'ro', label='data from microscopy')
plt.legend(loc='upper left')
plt.ylim([0, 1.1])
plt.xscale('log')
plt.tight_layout() 
plt.xlabel('[IPTG] (M)')
plt.ylabel('fold-change')
Out[63]:
<matplotlib.text.Text at 0x1275e1a20>

We are now ready to calculate the fold-change in gene expression in regards to changing concentration of IPTG. As we can recall from equation 6 of the manuscript, the fold-change can be written as $$ \begin{align} \mathrm{fold-change}\approx\left(1 + \frac{\left(1 + \frac{c}{K_A}\right)^2}{\left(1 + \frac{c}{K_A}\right)^2 + e^{-\beta\Delta\varepsilon_{AI}}\left(1 + \frac{c}{K_I}\right)^2}\frac{R}{N_{NS}}e^{-\beta\Delta\varepsilon_{RA}}\right)^{-1}, \end{align} $$ where $c$ is the concentration of IPTG, $K_A$ and $K_I$ are the dissociation constants of IPTG from the active and inactive repressor respectively, $R$ is the the total number of repressors, $\Delta\varepsilon_{AI}$ is the energy difference between the active and inactive repressor, $\Delta\varepsilon_{RA}$ is the operator binding energy of the repressor, and $\beta$ is $(k_BT)^{-1}$ where $k_B$ is the Boltzmann constant and $T$ is the temperature of the system.

We have already determined the most likely parameter values for $K_A$ and $K_I$ from the flow-cytometry experiments, allowing us to plot the prediction. For completeness, these parameters are $$ \begin{align} K_I &= 0.53\times 10^{-6}\,\,M\\ K_A &= 139\times 10^{-6}\,\,M. \end{align} $$

For dramatic effect, we'll plot the precicted behavior first using the parameters defined above.

In [64]:
# Define the necessary parameters.
epa = -np.log(139E-6)
epi = -np.log(0.53E-6)
epr = -13.9  # In units of kBT
iptg = np.logspace(-9, -2, 1000)
R = np.array([130])  # Number of lac tetramers per cell.

# Generate the theoretical fold change.
fc = mwc.fold_change_log(iptg, epa, epi, 4.5, R, epr)

# Plot the prediction.
plt.figure()
plt.plot(iptg, fc, 'r-', label='prediction')
plt.xlabel('IPTG (M)')
plt.ylabel('fold-change')
plt.xscale('log')
plt.tight_layout()

Now we can compute the fold change from the microscopy data. We will write a function that can be placed applied to a grouped dataframe.

In [65]:
def df_fold_change(df):
    """
    Computes the fold-change given a dataframe of intensity measurements.
    
    Parameters
    ----------
    df : pandas DataFrame
        DataFrame containing 'rbs', and 'IPTG_uM' columns.     
    rbs :  str
        Name of experimental sample.
     
    Returns
    -------
    fc : float
        Value of fold change
    """
         
    return (df['RBS1027'] - df['auto'])  / (df['delta'] - df['auto'])
In [66]:
# Plot the prediction.
plt.figure()
plt.plot(iptg, fc, 'r-', label='prediction from flow-cytometry')
plt.xlabel('IPTG (M)')
plt.ylabel('fold-change')
plt.xscale('log')

# Plot the experimental data. 
plt.plot(np.array(iptg_range)/1E6, fc_exp, 'o', markerfacecolor='w',
         markeredgewidth=2, markeredgecolor='r', markersize=8, label='data from microscopy')
plt.legend(loc='upper left')
plt.ylim([0, 1.1])
plt.tight_layout()

We see that the parameters estimated from the flow cytometry data match quite well with the data collected using a separate (and more sensitive method). To confirm that these two methods produce quantitatively indistinguishable results, we analyzed a subset of the flow cytometry collected data with the corresponding microscopy data sets. Please see the appendix section C for more detailed information.