An Introduction to the AMPAL Framework¶
ISAMBARD represents proteins with Python objects that are part of the AMPAL module. AMPAL stands for Atom, Monomer, Polymer, Assembly, Ligand; which are the main functional elements of the framework. These objects are designed to represent the hierarchical nature of proteins in an intuitive way.
All protein models produced by ISAMBARD are AMPAL objects, but you can also load in crystal structures.
Q. So why aren’t these objects just called Protein, Chain and Residue? Well, there are
Polypeptide
, andResidue
objects, but they are just protein-specific versions ofPolymer
andMonomer
. We wanted to keep the base objects as generic as possible to allow other biomolecules (like DNA or RNA), or even unnatural polymers (like \(\beta\)-amino acids) to be represented using this architecture. While some these features are not currently implemented in ISAMBARD, this will lead to more scalable code with reduced duplication in the future.
1. Converting PDB files to AMPAL Objects¶
import ampal
Any PDB file can be parsed into an AMPAL object, which allows you to
easily analyse the structure. The only function you need for this is
ampal.load_pdb
. It takes a file path string as the input argument:
ampal.load_pdb('3qy1.pdb')
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
The object that is returned is an Assembly
. We can assign it to a
variable and look inside it.
my_protein = ampal.load_pdb('3qy1.pdb')
Remember if you are using the object in Jupyter Notebook once it’s
assigned to a variable, you can have a look at its attributes and
methods by typing my_protein.
and then pressing tab.
2. Basic Analysis¶
This Assembly
contains two Polypeptides
, and 449 Ligands
. It
is easy to check the amino acid sequences of the Polypeptides
:
my_protein.sequences
['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI',
'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI']
The .sequences
attribute is a list of sequence strings, one for each
Polymer
. We can determine other basic properties of the
Assembly
:
my_protein.molecular_weight
48508.931580000004
my_protein.molar_extinction_280
83640
my_protein.isoelectric_point
5.400000000000004
my_protein.id
'3qy1'
3. Selecting Chains¶
Items inside each Assembly
object can be accessed analogously to
accessing items in a standard Python list:
my_protein[0] # The first chain
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
You can also select a Polymer
using a string of the chain id from
the PDB file. In this case there are two chains ‘A’ and ‘B’.
my_protein['A']
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_protein['B']
<Polypeptide containing 216 Residues. Sequence: KDIDTLISNNAL...>
The Polypeptide
object has a lot of the same functionality as the
Assembly
:
my_chain_a = my_protein['A']
my_chain_a.molecular_weight
24199.38728
my_chain_a.molar_extinction_280
41820
my_chain_a.isoelectric_point
5.400000000000004
my_chain_a.id
'A'
4. Selecting Residues¶
Each Polypeptide
object is made from one or more Residue
objects. You can access the Residues
using square brackets:
my_chain_a[0]
<Residue containing 8 Atoms. Residue code: ASP>
my_chain_a[4]
<Residue containing 8 Atoms. Residue code: LEU>
my_chain_a[20]
<Residue containing 7 Atoms. Residue code: PRO>
You can use a string of a residue id from the PDB file to select a
Residue
:
my_chain_a['23']
<Residue containing 7 Atoms. Residue code: PRO>
my_chain_a['40']
<Residue containing 8 Atoms. Residue code: ILE>
If you use a residue number that isn’t defined in the PDB a KeyError
will be raised:
my_chain_a['2']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-52706ed7e5b3> in <module>()
----> 1 my_chain_a['2']
~/anaconda3/lib/python3.6/site-packages/ampal/protein.py in __getitem__(self, item)
190 if isinstance(item, str):
191 id_dict = {str(m.id): m for m in self._monomers}
--> 192 return id_dict[item]
193 elif isinstance(item, int):
194 return self._monomers[item]
KeyError: '2'
my_residue_A23 = my_chain_a['23']
Residues
contain an OrderedDict
(a special type of
dictionary
that retains the order you add elements) which has atom
identifiers and Atom
objects all the atoms that make up the
Residue
.
my_residue_A23.atoms
OrderedDict([('N',
<Nitrogen Atom (N). Coordinates: (22.124, -4.140, -35.654)>),
('CA',
<Carbon Atom (CA). Coordinates: (22.664, -3.954, -34.292)>),
('C', <Carbon Atom (C). Coordinates: (21.911, -2.875, -33.515)>),
('O', <Oxygen Atom (O). Coordinates: (21.863, -2.926, -32.283)>),
('CB',
<Carbon Atom (CB). Coordinates: (24.120, -3.555, -34.534)>),
('CG',
<Carbon Atom (CG). Coordinates: (24.124, -2.964, -35.917)>),
('CD',
<Carbon Atom (CD). Coordinates: (23.118, -3.764, -36.681)>)])
5. Selecting Atoms¶
Atoms can be selected using a string of their PDB atom type, for example the C\(\alpha\) atom of the residue can be selected like this:
my_residue_A23['CA']
<Carbon Atom (CA). Coordinates: (22.664, -3.954, -34.292)>
my_residue_A23['CG']
<Carbon Atom (CG). Coordinates: (24.124, -2.964, -35.917)>
my_residue_A23['N']
<Nitrogen Atom (N). Coordinates: (22.124, -4.140, -35.654)>
my_atom_A23ca = my_residue_A23['CA']
The individual coordinates can be selected using square brackets:
my_atom_A23ca[0]
22.664
my_atom_A23ca[2]
-34.292
Or with the x
, y
and z
properties:
my_atom_A23ca.x
22.664
my_atom_A23ca.y
-3.954
my_atom_A23ca.z
-34.292
The Atom
object contains some useful attributes:
my_atom_A23ca.id # The atom number from the PDB file
162
my_atom_A23ca.element # The element of the atom
'C'
6. AMPAL Parents¶
Hopefully you can see that it’s easy to traverse down the AMPAL
framework from Assembly
level to the Atom
level, but it’s just
as easy to work your way back up. With any AMPAL object you can use the
parent
attribute to find the AMPAL object that it is contained
inside.
my_atom_A23ca.parent
<Residue containing 7 Atoms. Residue code: PRO>
my_residue_A23.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_chain_a.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
This attribute returns the original object itself, meaning you can
access all its methods and functions, including its own
ampal_parent
!
my_atom_A23ca.parent == my_residue_A23
True
my_residue_A23.parent == my_chain_a
True
my_atom_A23ca.parent.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_atom_A23ca.parent.parent.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
my_residue_A23.parent.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
my_atom_A23ca.parent.id
'23'
my_residue_A23.parent.sequence
'DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI'
my_chain_a.parent.sequences
['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI',
'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI']
7. Ligands¶
The last AMPAL objects to discuss are Ligand
and Ligands
. These
are intended to store non-protein elements from the PDB file. The
ligands can be extracted from the Assembly
:
my_protein.get_ligands()
<Ligands chain containing 449 Ligands>
Ligands
is a special Polymer
object, with none of the
protein-specific Polypeptide
functionality. It contains one or more
Ligand
objects which you can select in exactly the same way as
selecting Residues
from Polypeptides
:
my_ligands = my_protein.get_ligands()
my_ligands[0]
<Ligand containing 1 Atom. Ligand code: ZN>
my_ligands['221']
<Ligand containing 1 Atom. Ligand code: ZN>
The Ligand
objects are Monomer
objects, without the
protein-specific functionality that is present for Residues
. Since
Ligand
and Residue
are both examples of Monomer
, they have a
lot of the same functionality:
my_ligand_zinc = my_ligands[0]
my_ligand_zinc.atoms
OrderedDict([('ZN',
<Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>)])
my_ligand_zinc['ZN']
<Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>
This zinc atom is associated with one of the Polypeptide
chains, and
this is reflected in its ampal_parent
.
my_ligand_zinc.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
8. Summary and activities¶
With these simple methods you can load in a PDB file and select various different parts of the protein. Please try playing around with the example code and try to select different parts of the protein.
Try loading in a PDB file of your own and select various parts of the protein and ligands.
Find the other builtin functions either by:
Tabbing the object in Jupyter Notebook
Looking at the documentation
Finding the
base_ampal
code in the ISAMBARD folder and looking through it (tip: you can do this with the IPython file browser)
In the next section we’ll look at how we can perform more complex selections and more detailed analysis on these objects.