An introduction to this series:
To be explicitly clear, these set of articles don’t intend to be superior to any other existing body of knowledge on the subject of protein structures. What is intended here is just another textbook introduction followed by a very context specific description of these beauties so that a researcher either new to protein structure analysis or complementing other data can be guided in some way or the other. OK, so that does sound like this series is going to be superior, may be it is. We will see.
So here is what is going to follow:
- Some background on the subject of Proteins. I will try defining them and looking at their building blocks and try doing this using typical textbook stuff.
- I will also be, along the way, pointing out unanswered questions so that the audience can have an idea of what kind of questions can they try answering in their research. Disclaimer: my knowledge is not exhaustive so I will only throw questions out there that I know about.
There are several hundreds if not thousands of definitions of these bio-molecules out on the internet. A first search of the word “protein” on the Google search engine brings up a mind-boggling number of results, 706 million to be precise. However, in the context of this work not all of these results will point you in the correct direction. For instance, and this may be different from your search results [try generating your search result here] depending on your geographic location, search history, etc, my first page of results looks like this. It starts with some nearby locations from Google Maps and follows that with some suggestions “People also ask”, 10 web page hits, and searches related to protein. Even though I have have been looking at protein structures for well over 8 years, only one hit gives me the correct result, i.e. the result that I actually want. All other results related to dietary proteins not protein proteins (by which I mean the organic chemistry-like results). In all fairness Google searches are biased to show popular, trending content and I guess it is true that majority of the world is looking at balancing the protein content in their diets or enriching further for muscle growth, but that is very different from the way proteins are discussed here. Although I will admit that I had an ulterior motive hidden in this rant — Google will show this content to those people now as well, getting more exposure for this series, albeit useless exposure but we will come back to that in a couple of months and see how well this article ranks in google search results.
So after all that rant about what this article is NOT going to be about, we now come to “Proteins”, the beautiful polymeric bio-molecules that in some shape and form are responsible for each and every process that happens within your body, keeping you alive. When they stop doing their work, well we will come to that in a while because here we still need to define these in the context of what they are. So at the most fundamental level we have an atom. Note that this statement is written in context of biology. Atoms are not the most fundamental particles but for our “biological” purpose we will assume they are. Some atoms, namely those of Carbon, Nitrogen, Oxygen etc. come together to form biochemical units of proteins called amino acids. Although atoms can come together in many arrangements, amino acids is one family of those arrangements. Details on this family can be found in any organic chemistry textbook [Wikipedia]. Here I will refrain from those details. What is important to mention here is that although several amino acids exist only 20 standard and two of those 20, chemically modified make their way into proteins. Amongst the many questions in “biology” that remains unanswered, this makes the list. Not just the limited number of amino acids, their stereo-chemical nature also makes that list. To explain this quickly, each amino acid (except glycine) has another amino acid which is its mirror image [Wikipedia]. One is denoted with an L (for levo) and the other denoted with a D (for dextro) rotary amino acid. Only the “L” amino acids are found in proteins, so the question, yet to be satisfactorily answered is why only “L”? So coming back to the proteins, the amino acids come together acting as units and join together to make a polypeptide chain [Wikipedia], also known as a protein. Interestingly, to add to the list of all biological questions unanswered, the factory inside the cell that produces these proteins, called the Ribosome [Wikipedia], also comprises of proteins (and RNA), which begs the question similar to the chicken and the egg conundrum, which came first, or simply put, how was the first ribosome factory of proteins made, which subsequently started making other proteins? The search for an answer to this question takes us way back to start of life itself but I will leave that discussion to another day and another article series.
So what we know so far is that some types of atoms make certain amino acids which are then used as building blocks to make proteins at biological factories called ribosomes. The exact process can be found in any molecular biology textbook. I will only be covering those aspects which can bring questions to the front, for the passionate readers to pursue. So now that we have made these we need to see how they work and what work they carry out, however before we can address that, we must digress a bit. Proteins when they are produced at the biological factories are nothing more than a string of amino acids, not capable of their respective function. Interestingly, they do not stay that way for long. Under the influence of factors unknown the string of amino acids making up the protein folds up nicely into a higher order structure. Biological textbooks nicely view these higher order structure through hierarchies of primary, secondary, tertiary and quaternary structures, Figure-1. Primary is what the proteins starts with, i.e. the string of amino acids joined to form a protein [Wikipedia]. Secondary structure elements are alpha-helices and beta-strands, where multiple beta-strands come together in a parallel and anti-parallel manner to make up beta-sheets [Wikipedia]. Tertiary structures are when multiple secondary structures and some unstructured regions tightly fold together [Wikipedia], with quaternary structures being those where multiple tertiary structures come together [Wikipedia]. To add another question to the long list of unanswered questions, it is not known what the driving factors are for the protein as it takes up its final shape. Two popular competing ideas includes, chaperones (other proteins that specialize in helping proteins fold) and the hydrophobic collapse (where some of the 20 amino acids that do not like water, repel water and make the protein core whereas those that like water come to the surface of the protein). Another idea is that proteins randomly find their final shape (tertiary structure). Bottom line is that we don’t have a consensus on how this happens [Further reading].
So far we know what makes up a protein, and what are the units of its structure. But before we can work our way to the function carried out by these bio-molecules, it is important to understand the role of their structure, for a protein can only function properly if it takes up a correct shape which can then enable it to play its role. A misfolded protein is a bad protein as it fails to undertake the function for which it was designed by the cell [Further reading: 1, 2]. This takes support away from the hypothesis that proteins take up their shape randomly because if that were true, a lot of proteins would get lost in their quest to fold properly and undertake their roles (we will come back to this when we look at energetics of protein folding), which is going to waste a lot of resources on producing something which might not work.
Natural processes are not known to be wasteful, so it’s probably likely that random folding is NOT a way in which proteins fold. So back to the importance of structure. Once proteins take up their final shape, they present either cavities or amino acids on their exterior, Figure-2, with properties which allow these proteins to carry out their job. So now you can see the value of folding correctly, for a badly folded protein may not have the right cavity or may have different residue lining up the site and hence unable to function.
Assuming our protein has correctly folded, it is now ready to do its job. Roles include but are not limited to, enzymatic function (like the break down of carbohydrates in your food by amylase produced by glands in your mouth), signalling (like the role insulin plays in telling your cells to take up glucose from the blood unless you have type-2 diabetes in which case you have insulin but your body does not react to it) and structural proteins (like collagen which is used by your body to strengthen bones etc).
With this introduction and having to a certain degree emphasised on the structure we now continue with the analysis of protein structures. In this article I brought up:
- Some questions related to phylogenetics, which will be covered in a separate series like use of only 20 amino acids, preference for L-amino acids and how did the ribosomes come about.
- Protein folding through chaperones and hydrophobic collapse. While the roles of chaperones can best be explored through biochemical analysis in labs (something which I would never do), the hydrophobic collapse theory can be tested using data available. In this series we will analyze this and make our own conclusion if this explains how proteins fold. I will be publishing the method and the code after some more ground work through these articles.
The next post in this series will be a continuation of this work and will address our ability to determine protein structures experimentally and lead into public protein structure databases which we will be frequently using to analyse and get meaningful results from the analysis.
Dated: 15th December, 2018