(Skip to menu.)

Rob Phillips Group - Research Topics

Reduced alphabets for bioinformatics

The total number of protein folds in nature is estimated to be in the hundreds to low thousands, an astonishing fact given the huge sequence space that proteins have to explore. From the wealth of structures and their associated sequences now available in the PDB it is clear that the same protein fold may be generated by many amino acid sequences. In some cases the sequences underlying similar structures show almost zero sequence identity. This large degeneracy invites us to look for a coarse-grained description that will reveal the underlying similarities between these apparently dissimilar sequences.  Taking our inspiration from the HP model of protein folding first introduced by Ken Dill we are evaluating the usefulness of reduced amino acid alphabets for identifying remote homologs.

 

 

 

 

 

 

 

ParM Alignment

Figure 1.  Shown are crystal structures for ParM (1MWM, on the right) found in prokaryotes and Ta0583 (2FSJ, on the left) an archaeal actin homolog.  After an alignment of their crystal structures the molecules were colored by the RMSD between the protein backbones carbons; blue indicates low, white moderate and red large RMSD.  In (b) the molecules have been colored by sequence similarity with dissimilar residues shown in red and conserved residues in blue; the sequences are nearly completely divergent.  Many such examples of proteins with highly similar structures and low (<30%) sequence identity have been found.