Installing Biopython and other things pythonic

By | April 12, 2019

The last post in the Protein Structure series ended with:

“In the next post, I will use Biopython to import a structure and do some fairly introductory calculations”.

In this post I will pick up from there.

Installing Biopython

While there are multiple ways in which one can make use of Biopython, I will use it through the Jupyter Notebook. So the first order of business would be to set it up to make use of Python. The instructions to do this were posted earlier in “Setting up Pyton“.

So assuming you have gotten python up and running in the Jupyter Notebook, we will start by import biopython. To do this type the command

 import Bio 

You will write this in the available box (called a cell) and when you are finishing writing the command, you will run it by pressing the Cntrl + Enter key together. The square bracket next to the cell [ ] which show an asterix (*) symbol indicating the command is running. When the instruction in the cell has finished processing it will show a number.

In the Jupyter Notebook you can always re-run a cell, but in that case the number in the square bracket will still be updated.

However in this case it highly likely you will get an error message saying that the module you have just tried loading isn’t available (Figure 1).

Figure 1: Error message when trying to load Biopython. This is a typical error message when trying to load a module which is either not installed or is installed but not visible to Python.

This is because you have just made a “FRESH” install and therefore nothing else but what you have installed is available. It is highly likely that Biopython is not wrapped in this install, hence highly likely that you will receive this message. If for some reason you don’t,  you can skip the next step where I show how to install Biopython.

To install biopython, type in a cell:

 !conda install --yes biopython 

and press Cntrl + Enter to run the command. Each cell can have more than one command as well. This installation process may be very quick or take a long time depending on your internet connection and your computer’s speed.

Make sure the process is successful. You should see and output like in Figure 2 if everything happens according to plan. I cannot comment on the type of errors which may occur, so if you have difficulty installing, please get in touch (contact.idrack@gmail.com or use the Facebook page to send me a message)

Figure 2: Successful installation of Biopython using conda within Jupyter Notebook.

At this point you have successfully installed Biopython and are now ready to use it.

Other stuff:

Although I have shown how to install Biopython and load it, we will be making use of many other modules as well. Fortunately many of these modules will come installed by default with python. We will make use of these regularly. As an example let’s try and work something out.

For example, assume a scenario where you want to import a data file which is located on your computer’s hard-drive at a certain location. Before you can import it, you must be able to “set the path” to that file. There are different ways of doing this. The way I will be doing this is that every time we will be using Python, we will make a “Working Directory” and keep all files required for a project inside that folder. This will limit the need to change paths.

Let’s look at an example to clarify this. To detect paths in python you will need a module called “os”. You will load this module using:

 import os 

This is good, because when you import a module, all functions in that module become available. When you load the module “os”, the function “getcwd” becomes available for use. To make use of this method type

 print os.getcwd() 

where “os” is the module and “getcwd()” is a method of that module which you will access through the use of “dot”. The command os.getcwd() command will tell you the current location available to python. Remember that if you create a file and want to read a file, this is the location python is looking at. You must put your files in this location. If you don’t want to use this location you can change it. To change the path, you can make use of another method called “chdir(new_path)”. This method takes as its input (input is the value you will pass within the brackets, i.e. by replacing the string new_path) the new path which you would like to use. See Figure 3.

Figure 3: The first cell shows how to load the “os” module to make use of methods available in it. The second cell makes use of “os.getcwd()” to find out the current location visible to python. Since I am using Linux and the way Jupyter is set up on my computer, it points to my user home, which is ‘/home/proteinmechanic/’. On your computer this path will be different. If you are on windows, it will also include your drive letter (e.g. C:\). In the following cell the “os.chdir(path)” is used, where path is the full path of the new location. Again, I chose something on my computer. You will select whatever is convenient for you. After running the “os.chdir(path)” command, we can check to see if the path has indeed been changed. The output from “os.getcwd()” shows that the path has now been updated. 

This series is not on “Learning Python” but just getting equipped with enough of it to make it work and do what you want it to do. So I will end this post here. In the next post we will start from here and start making use of the PDB structures, which we will import from a certain location and process to get analytics.

For you to be able to follow the next post smoothly, make sure to follow the instructions in the last and current posts. If you have any difficulty, please reach me through the IDRACK Facebook page.

One thought on “Installing Biopython and other things pythonic

  1. Pingback: The Protein Structure Series: Post – IV - IDRACK: Blog

Leave a Reply

Your email address will not be published. Required fields are marked *