Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time

William T. Ireland, Suzannah M. Beeler, Emanuel Flores-Bautista, Nicholas S. McCarty, Tom Röschinger, Nathan M. Belliveau, Michael J. Sweredoski, Annie Moradian, Justin Kinney, Rob Phillips


The full manuscript can be found here.


Advances in DNA sequencing have revolutionized our ability to read genomes. However, even in the most well-studied of organisms, the bacterium Escherichia coli, for ~65% of the promoters we remain completely ignorant of their regulation. Until we have cracked this regulatory Rosetta Stone, efforts to read and write genomes will remain haphazard. We introduce a new method (Reg-Seq) linking a massively-parallel reporter assay and mass spectrometry to produce a base pair resolution dissection of more than 100 promoters in E. coli in 12 different growth conditions. First, we show that our method recapitulates regulatory information from known sequences. Then, we examine the regulatory architectures for more than 80 promoters in the E. coli genome which previously had no known regulation. In many cases, we also identify which transcription factors mediate their regulation. The method introduced here clears a path for fully characterizing the regulatory genome of model organisms, with the potential of moving on to an array of other microbes of ecological and medical relevance.

Code and Data Availability

An in-depth discussion of all experimental protocols and mathematical analysis used in this study can be found on the GitHub Wiki for this study. All code used for processing data and plotting as well as the final processed data, plasmid sequences, and primer sequences can also be found on the GitHub repository. Energy matrices were generated using the MPAthic software. All raw sequencing data is available at the Sequence Read Archive (accession no.PRJNA599253 and PRJNA603368). All inferred information footprints and energy matrices can be found on the CaltechData repository (DOI) All mass spectrometry raw data is available on the CaltechData repository (DOI).


We are grateful to Rachel Banks, Stephanie Barnes, Curt Callan, Griffin Chure, Ana Duarte, Vahe Galstyan, Hernan Garcia, Soichi Hirokawa, Thomas Lecuit, Heun Jin Lee, Madhav Mani, Muir Morrison, Steve Quake, Manuel Razo-Mejia, Gabe Salmon, and Guillaume Urtecho for useful discussion and feedback on the manuscript. Guillaume Urtecho and Sri Kosuri have been instrumental in providing key advice and protocols at various stages in the development of this work. We would like to thank Jost Vielmetter and Nina Budaeva for providing access to their Cell Disruptor. Brett Lomenick provided crucial help and advice with protein preparation. We also thank Igor Antoshechkin for his help with sequencing at the Caltech Genomics Facility.


We are deeply grateful for support from NIH Grants DP1 OD000217 (Director’s Pioneer Award) and 1R35 GM118043-01 (Maximizing Investigators Research Award) which made it possible to undertake this multi-year project. N.M.B. was supported by an HHMI International Student Research Fellowship. S.M.B was supported by the NIH Institutional National Research Service Award (5T32GM007616-38) provided through Caltech.