The previous post focused a little on protein synthesis and briefly discussed protein sequences and some areas in which they are used. Another important area of protein bioinformatics is protein structure. This post introduces the basis of protein structure.
Once the synthesis of polypeptide chains (proteins) at the ribosome is complete, the polypeptide chain folds into an intricate three-dimensional (3D) shape. How a protein folds is poorly understood, but a commonly held belief is that physical forces are responsible. I will try to cover this in some detail in the later part of this article. For now the basics of protein structure.
The polypeptide chain, also referred to as the primary structure, leads to the formation of a secondary level of organisation, called the secondary structure. The secondary structure is characterised by the presence of two unique secondary structure elements (SSEs) i.e. the α-helix and the β-sheets (Figure 1). These SSEs are formed by hydrogen bonds (hbonds) between different pairs of atoms. The arrangement of the hydrogen bonds between every third (i and i + 3) amino acids (also referred to as residues) on the polypeptide chain result in the formation of an α-helix. Formation of consecutive hbonds between distant atoms characterises β-sheets. Another element in the protein structure is one which doesn’t fall into any of the above two well-defined categories and is referred to as the disordered regions. Turns are sections that connect two SSEs. An ordered collapse of the secondary structure leads to the generation of tertiary structure. While all the residues in a polypeptide chain are linked through the peptide bond, two proteins may come together to form dimers or other multimers, resulting in the quaternary structure of proteins.
While the above provides a summary of the various structural levels, how a protein follows this pathway to its final shape is not known. The following content discusses some detail around the protein folding subject.
One of the theories trying to explain the folding behaviour is that of hydrophobic collapse. Some amino acids, depending on their chemical makeup might be hydrophobic. The theory states that these residues in the protein drive the structure to initially collapse pushing the water away and hence a hydrophobic core is established (I will run an experiment around this in one of my future posts). This makes sense as this behaviour favours an increase in entropy. The variation in the energy landscape is very small (not tested but known from this video) so the reason for a protein to spontaneously fold must be driven by entropy but this theory has weaknesses and I am not going to be discussing them here. That perhaps can be another topic for another day. For now have a look at the complete video or at least from 12 min 40 sec [click link] and other content on protein folding by Dill et al., to see the state of the field (a more recent review  is available but unfortunately isn’t open access).
Experimental determination of Protein structure:
All known protein structures are deposited in the protein structural databases. Although a number of them exist one popular one (because I use it) is RCSB. This database accepts experimentally determined structures. Popular methods of determining structure are through X-ray crystallography, NMR, cryo-EM, Neutron scattering etc. Each of these methods have their own pros and cons. While X-ray and NMR are fairly well established methods other are in their infancy but have a lot of promise to give better results in the future. Again I am going to leave the details of these for another post.
The structures deposited in RCSB can be downloaded and visualised using molecular structure visualisers like VMD, Pymol (there is one available online with an academic licence as well, can’t remember where. Get in touch) etc. I have worked with both VMD and Pymol and will be adding analysis scripts for both of these in due time.
An important distinction to be made here is between a molecular editor and visualiser. An editor is a program which allows you to make physical changes in the structure of a molecule e.g. addition or removal of atoms. A visualiser on the other hand only allows you to see things and measure observables like distances and angles etc. So not all visualisers are editors, but all editors are probably going to to be visualisers as in order to edit something you will probably have to see it. An example of a molecular editor would be Avogadro. All three programs mentioned VMD, Pymol and Avogadro can be used for editing purposes (if you know what you are doing). Windows users can also look at Discovery studio visualiser, DSV (for visualisation and editing despite the name). They have a linux build as well but it might not be the most user friendly of things.
Measurables from protein structure:
So you have a protein structure, you can look at it in a visualiser, but that isn’t all you can do with a structure. In my doctoral work I established a new way to use protein structural data (or rather contributed significantly to the field of structural phylogenetics) in building structural phylogenies, however since that is in its infancy at the moment I will only be discussing established things.
For instance structures, in apo and bound forms, are used to compare changes in the binding site of a protein. Apart from the binding site, other conformational changes can also be quantified. A lot of research papers look at this. Outside of conformational changes, other parameters you can look at are backbone and side chain angles (phi, psi, omega and chi). These types of angles (backbone) allow you to judge the quality of a protein structure. Number of hydrogen bonds, solvent exposed surface area, inter-atomic distances, volume, radius of gyration etc. are all properties that can be quantified. All these properties also have energetic contributions. OK, so I probably skipped a few things there. You might wonder how I went from measuring distances and other stuff to looking at energetics. I will try and explain this briefly and will try to over simplify as well.
Proteins are thermodynamic molecules, which means that the atoms that make up these molecules experience motion at body temperature (310.15 Kelvin). So how come a protein folds? If the atoms are all trying to move, an ordered structure is perhaps the last state a protein should be in. Right? Turns out that while atoms are all trying to move, there are certain other forces at play. These other forces are electrostatic forces, yes like the ones between opposite charges. Hydrogen bonds are an examples of that (oversimplified, as they are made from partial charges). Other examples of attraction between opposite charges would be between positively charged residues like Arginine, Lysine, Histidine and those that are negatively charged like Glutamate etc. These forces can then be converted to energy. An alternative way to look at this would be to ask how much potential energy is lost as the protein folds? This would be contributed to by the net sum of the attractive and repulsive forces. (This is extremely oversimplified, I recommend proper reading on this subject or get in touch if you have questions).
While the electrostatic energies are significant other forms of energies are also responsible for a protein structure. E.g. atomic distances can allow measurements of bonded energies through the use of harmonic models. I am going to stop here because I can see this going into areas which I will discuss in future posts.
In summary, there are programs and mathematical models which use protein structures to work out the energy of a protein. That energy in itself is not very informative, unless you compare it to another protein. Again, for exemplary purposes only, you can try to compare a bound and non-bound protein to see the change in energy upon ligand binding. There are better methods to do this e.g. FEP and TI so resort to those for research purposes.
In this post we looked at protein structures, how they fold (or at least how we think they fold), how is structure experimentally determined and where they are stored (databases), and what type of things can you use the protein structures for. The future posts will look at some scripts to measure some or all of these mentioned quantities.