The last post in the Protein Structure series ended with:
“In the next post, I will use Biopython to import a structure and do some fairly introductory calculations”.
In this post I will pick up from there.
While there are multiple ways in which one can make use of Biopython, I will use it through the Jupyter Notebook. So the first order of business would be to set it up to make use of Python. The instructions to do this were posted earlier in “Setting up Pyton“.
So assuming you have gotten python up and running in the Jupyter Notebook, we will start by import biopython. To do this type the command
You will write this in the available box (called a cell) and when you are finishing writing the command, you will run it by pressing the Cntrl + Enter key together. The square bracket next to the cell [ ] which show an asterix (*) symbol indicating the command is running. When the instruction in the cell has finished processing it will show a number.
In the Jupyter Notebook you can always re-run a cell, but in that case the number in the square bracket will still be updated.
However in this case it highly likely you will get an error message saying that the module you have just tried loading isn’t available (Figure 1).
This is because you have just made a “FRESH” install and therefore nothing else but what you have installed is available. It is highly likely that Biopython is not wrapped in this install, hence highly likely that you will receive this message. If for some reason you don’t, you can skip the next step where I show how to install Biopython.
To install biopython, type in a cell:
!conda install --yes biopython
and press Cntrl + Enter to run the command. Each cell can have more than one command as well. This installation process may be very quick or take a long time depending on your internet connection and your computer’s speed.
Make sure the process is successful. You should see and output like in Figure 2 if everything happens according to plan. I cannot comment on the type of errors which may occur, so if you have difficulty installing, please get in touch (firstname.lastname@example.org or use the Facebook page to send me a message)
At this point you have successfully installed Biopython and are now ready to use it.
Although I have shown how to install Biopython and load it, we will be making use of many other modules as well. Fortunately many of these modules will come installed by default with python. We will make use of these regularly. As an example let’s try and work something out.
For example, assume a scenario where you want to import a data file which is located on your computer’s hard-drive at a certain location. Before you can import it, you must be able to “set the path” to that file. There are different ways of doing this. The way I will be doing this is that every time we will be using Python, we will make a “Working Directory” and keep all files required for a project inside that folder. This will limit the need to change paths.
Let’s look at an example to clarify this. To detect paths in python you will need a module called “os”. You will load this module using:
This is good, because when you import a module, all functions in that module become available. When you load the module “os”, the function “getcwd” becomes available for use. To make use of this method type
where “os” is the module and “getcwd()” is a method of that module which you will access through the use of “dot”. The command os.getcwd() command will tell you the current location available to python. Remember that if you create a file and want to read a file, this is the location python is looking at. You must put your files in this location. If you don’t want to use this location you can change it. To change the path, you can make use of another method called “chdir(new_path)”. This method takes as its input (input is the value you will pass within the brackets, i.e. by replacing the string new_path) the new path which you would like to use. See Figure 3.
This series is not on “Learning Python” but just getting equipped with enough of it to make it work and do what you want it to do. So I will end this post here. In the next post we will start from here and start making use of the PDB structures, which we will import from a certain location and process to get analytics.
For you to be able to follow the next post smoothly, make sure to follow the instructions in the last and current posts. If you have any difficulty, please reach me through the IDRACK Facebook page.