{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Tutorial : Constructing an energy matrix using Linear regression and MCMC " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code in this tutorial is released under the [MIT License](https://opensource.org/licenses/MIT). All the content in this notebook is under a [CC-by 4.0 License](https://creativecommons.org/licenses/by/4.0/). \n", "\n", "Created by Bill Ireland, Suzy Beleer and Manu Flores. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-11-27T22:55:34.400432Z", "start_time": "2019-11-27T22:55:30.721870Z" } }, "outputs": [], "source": [ "#Import basic stuff\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from sklearn import linear_model\n", "\n", "#import the custom analysis software\n", "import scipy as sp\n", "import plot_informationfootprint as pli\n", "import seaborn as sns\n", "\n", "# Activate a setting that causes all plots to be inside the notebook rather than in pop-ups.\n", "%matplotlib inline\n", "# Get svg graphics from the notebook\n", "%config InlineBackend.figure_format = 'svg' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will first load in a data set in a format accepted by the analysis software.\n", "\n", "During this experiment, we measure the frequencies of mutant promoters in the libraries via sequencing, which we label as 'ct_0'. We also measure the number mRNA counts produced by each mutant promoter via sequencing, which we label 'ct_1'. \n", "\n", "We then format the resulting dataset into a format usable by the data analysis software. The dataset must have the columns 'ct', 'ct_0', 'ct_1', and 'seq' where 'ct' is the total number of reads, and 'seq' is the sequence of the mutant promoter. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-11-27T22:55:34.411339Z", "start_time": "2019-11-27T22:55:34.404634Z" } }, "outputs": [], "source": [ "#We will declare the path where all the data for this notebook is stored. It can be downloaded from the\n", "#website under 'datasets' or from the github repo (in the datasets folder).\n", "path = '../MCMC/'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-11-27T22:55:34.482934Z", "start_time": "2019-11-27T22:55:34.415866Z" }, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | ct | \n", "ct_0 | \n", "ct_1 | \n", "seq | \n", "
---|---|---|---|---|
0 | \n", "10.0 | \n", "5.0 | \n", "5.0 | \n", "AAACAAAAAAACACATGAACGTATCTACTTGGTTCAATATAAGGAT... | \n", "
1 | \n", "2.0 | \n", "2.0 | \n", "0.0 | \n", "AAACAAAAAAACACATGAACGTATCTACTTGGTTCAATATAAGGAT... | \n", "
2 | \n", "10.0 | \n", "1.0 | \n", "9.0 | \n", "AAACAAAAAAAGACAGGAACGTAATGACTGGGTGAAATATAATCAT... | \n", "
3 | \n", "9.0 | \n", "9.0 | \n", "0.0 | \n", "AAACAAAAAAAGACAGGAACGTAATTACTGGGTTAAATATTATCAT... | \n", "
4 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "AAACAAAAAAAGACAGGAACGTAATTACTGGGTTAAATATTATCAT... | \n", "