Code - MWC Mutants

Executing The Code

This code was written to be executed in a particular file structure, diagrammed as follows:

With a GitHub account you can clone a repository with all data and code in place in the correct directory structure. Please follow the installation instructions on the GitHub repository.

Computational Environment

All analysis and data processing was performed with the following software configurations.

# Python Version
CPython 3.6.7
IPython 7.1.1

# Package Versions
bokeh==1.0.4
fcsparser==0.2.0
numpy==1.14.2
matplotlib==3.0.1
scipy==1.1.0
seaborn==0.9.0
pandas==0.23.4
tqdm==4.28.1
pystan==2.18.0.0
python_frontmatter==0.4.5
PyYAML==5.1

# System Information
compiler   : GCC 4.2.1 Compatible Clng 4.0.1 (tagss/RELEASE_401/final)
system     : Darwin
release    : 18.2.0
machine    : x86_64
processor  : i386
CPU cores  : 4
interpreter: 64bit

The `mut` Module

This work required a moderate amount of home-grown code. To ensure reproducibility, we have written it as a Python module that can be installed from the master branch of the GitHub repository. Please see the installation instructions for details. This module is required to execute all of the following scripts.

Figure Generation

All Figures in this work (save for Fig. 2) were generated programmatically.

Fig. 1(C): Razo-Mejia et al. 2018 Data Collapse

Used to generate the data collapse figures shown in Fig. 1C of the main text

Necessary Data Sets
- Data from Razo-Mejia et al. 2018

Fig. 3: DNA Binding Mutants

Used to generate Figure 3, illustrating DNA binding mutant data, induction profiles, data collapse and change in free energy.

Necessary Data Sets
- Summarized fold-change measurements
- DNA binding energy inference statistics
- Free energy inferred from fold-change measurements

Fig. 4: Inducer Binding Domain Mutants

Used to generate Figure 4, illustrating the inducer binding mutant data, induction profiles, data collapse, and change in free energy.

Necessary Data Sets
- Summarized fold-change measurements
- Ka and Ki inference statistics
- Allosteric parameter inference samples
- Free energy inferred from fold-change measurements

Fig. 5: Double Mutants

Used to generate Figure 5, illustrating the double mutant data, induction profiles, and change in free energy.

Necessary Data Sets
- Summarized fold-change measurements
- Allosteric parameter inference samples
- DNA binding energy inference samples
- Free energy inferred from fold-change measurements

Fig. 6: Data Collapse of the Inducible Simple Repression Motif

Used to generate Figure 6, illustrating full data collapse of previous measurements from Garcia and Phillips 2011, Brewster et al. 2014, Razo-Mejia et al. 2018, and this work

Necessary Data Sets
- Summarized fold-change measurements
- DNA binding energy inference statistics
- Allosteric parameter inference statistics
- Data from Garcia and Phillips 2011 and Brewster et al. 2014
- Data from Razo-Mejia et al. 2018

Interactive Figures

The two interactive figures associated with this work (the model explorer and the data collapse figure) were generated using the Bokeh plotting framework using a combination of Python code and Javascript. The Python code (which includes the javascript) can be downloaded below.

Model Explorer
Interactive Data Collapse
- Necessary Data Sets:

Inferential Models

All parameter inference was performed using the Stan probabilistic programming language. The following Stan model files perform the described analysis.

Inference of DNA Binding Energy
Used to estimate the DNA binding energy and homoscedastic error for a single induction profile.
Global Inference of DNA Binding Energy
Used to estimate the DNA binding energy for a single mutant from multiple induction profiles
Inference of Inducer Dissociation Constants
Used to estimate the inducer dissociation constants Ka and Ki for a single induction profile, assuming the allosteric energy difference is unperturbed from the wild-type value.
Inference of Allosteric Parameters
Used to infer the inducer dissociation constants Ka and Ki along with the allosteric energy difference for a single induction profile.
Global Inference of Allosteric Parameters
Used to estimate the allosteric parameters for a single mutant from multiple induction profiles.
Inference of Free Energy from Fold-Change Measurements
Used to infer the mean fold-change, standard deviation, and free energy for a collection of fold-change measurements.

The following scripts were used to execute the inference models.

Execution of parameter inference models
Loads the data, calls the inferential models, and save statistics and samples to tidy .csv files.
- Necessary Data Sets:
  1. Compiled fold-change measurements
Execution of global parameter inference models
Loads the data, estimates parameters for each mutant using all induction profiles, and saves samples and statistics.
- Necessary Data Sets:
  1. Compiled fold-change measurements

Supplementary Materials

The following code was used to generate the supplementary figures as well as any data generated via simulation.

Fig. S2: Prior Predictive Checks for DNA Binding Energy and Homoscedastic Error
Used to generate Figure S2 illustrating the prior distributions for the DNA binding energy and homoscedastic error as well as visualize the percentiles of the prior predictive checks.
- Necessary Data Sets and Auxiliary Scripts:
  1. DNA binding energy prior predictive check data
  2. DNA binding energy prior predictive check generation
Fig. S3: Prior Distribution Recovery for DNA Binding Energy and Homoscedastic Error
Used to generate Figure S3 which shows that the prior distributions of the parameters can be recovered through repeated inferential runs on simulated data.
- Necessary Data Sets and Auxiliary Scripts:
Fig. S3 - Fig. S5: Simulation Based Calibration of DNA Binding Energy Inference
Used to generate three figures (Fig. S3 - S5) which use simulation based calibration to assess the sensitivity and computational tractability of the statistical model for estimating the DNA binding energy.
- Necessary Data Sets and Auxiliary Scripts
Fig. S6: DNA Binding Energy Posterior Predictive Checks
Used to generate Fig. S6 which shows the MCMC sampling output and corresponding posterior predictive checks for DNA binding mutant Q21M.
- Necessary Data Sets
  1. Compiled fold-change measurements
  2. DNA binding energy samples
Fig. S7: Prior Predictive Checks for Free Energy Inference
Used to generate Fig. S7 which plots the percentiles of the prior predictive checks for estimation of the free energy from fold-change measurements.
- Necessary Data Sets and Auxiliary Scripts
  1. Empirical free energy prior predictive checks
  2. Empirical free energy prior predictive check generation
Fig. S8: Simulation Based Calibration for Free Energy Inference
Used to generate Fig. S8 which shows z-scores, shrinkage, and rank distribution for model that infers the free energy from a collection of fold-change measurements.
- Necessary Data Sets and Auxiliary Scripts
Fig. S9: Empirical Free Energy Posterior Predictive Checks
Used to generate Fig. S9 which shows the MCMC samples and the posterior predictive checks for estimating the mean fold-change for Y20I-Q294K with operator O2 at 50 µM IPTG.
- Necessary Data Sets and Auxiliary Scripts
  1. Compiled fold-change measurements
  2. Stan model for inference of empirical free energy from fold-change measurements
Fig. S10: Sensitivity Limits for Inferring Empirical Free Energy
Used to generate Fig S9 which shows the sensitivity limits for inferring the free energy from simulated and real fold-change measurements.
- Necessary Data Sets and Auxiliary Scripts
Fig. S11: Comparison of DNA Binding Energy Inference to Other Repressor Copy Numbers
Used to generate Fig. S11 which shows the influence of the choice of fit strain on predictions of induction profiles for other DNA binding mutants.
- Necessary Data Sets
  1. Summarized fold-change measurements
  2. DNA binding energy inference statistics
Fig. S12: Effect of DNA Binding Energy Fit Strain on ∆F Predictions
Used to generate Fig. S12 which shows how the choice of fitting strain influences the predicted change in free energy for DNA binding mutants.
- Necessary Data Sets
  1. Empirical free energy inference statistics
  2. DNA binding energy inference statistics
Fig. S13: Prior Predictive checks for Inference of Allosteric Parameters
Used to plot percentiles of prior predictive checks for two hypotheses for inducer binding domain mutants.
- Necessary Data Sets and Auxiliary Scripts:
  1. Prior predictive samples for inducer binding mutant models
  2. Generation of prior predictive samples for inducer binding mutant models
Fig. S14: Simulation Based Calibration of Inducer Binding Mutant Statistical Models
Used to plot z-scores, shrinkage, and rank distributions for inducer binding domain models.
- Necessary Data Sets and Auxiliary Scripts:
Fig. S15 - Fig. S16: Posterior Predictive Checks For Inducer Binding Domain Models
Plots the MCMC samples and posterior predictive checks for both inducer binding domain statistical models for the mutant Q294K paired with the O2 operator.
- Necessary Data Sets and Auxiliary Scripts:
  1. Generation of posterior predictive samples for inducer binding domain models
Fig. S17 - Fig. S18: Comparison of Fit Strain Choice on Prediction of Inducer Binding Domain Mutants
Compares the effect of the choice of fit strain under both inducer binding domain statistical models on predicting induction profiles.
- Necessary Data Sets:
Fig. S19: Influence of Choice of Fit Strain on ∆F Predictions for Inducer Binding Domain Mutants
Shows the predicted change in free energy for all choices of fit strain for inference of allosteric parameters.
- Necessary Data Sets:
  1. Empirical free energy inference statistics
  2. MCMC samples from inference of all allosteric parameters
Fig. S20: Global DNA Binding Energy Inference Induction Profiles and ∆F
Shows the fits of the induction profiles for each DNA binding mutant after global estimation of the DNA binding energy. It also shows the predicted ∆F along with the inferred free energies.
- Necessary Data Sets and Auxiliary Scripts:
Fig. S21: Global Allosteric Parameter Inference Induction Profiles and ∆F
Shows the fits of the induction profiles for each inducer binding mutant after global estimation of the allosteric parameters. It also shows the predicted ∆F along with the inferred free energies.
- Necessary Data Sets and Auxiliary Scripts:
  1. Summarized fold-change measurements
  2. Empirical free energy inference statistics
  3. Global allosteric parameter inference samples
  4. [Global inference of allosteric parameters ](/mwc_mutants/code/Chure2019_pooled_parameter_inference.py
  5. Stan model for global inference of allosteric parameters