An Introduction to the AMPAL Framework

ISAMBARD represents proteins with Python objects that are part of the AMPAL module. AMPAL stands for Atom, Monomer, Polymer, Assembly, Ligand; which are the main functional elements of the framework. These objects are designed to represent the hierarchical nature of proteins in an intuitive way.

All protein models produced by ISAMBARD are AMPAL objects, but you can also load in crystal structures.

Q. So why aren’t these objects just called Protein, Chain and Residue? Well, there are Polypeptide, and Residue objects, but they are just protein-specific versions of Polymer and Monomer. We wanted to keep the base objects as generic as possible to allow other biomolecules (like DNA or RNA), or even unnatural polymers (like \(\beta\)-amino acids) to be represented using this architecture. While some these features are not currently implemented in ISAMBARD, this will lead to more scalable code with reduced duplication in the future.

1. Converting PDB files to AMPAL Objects

import ampal

Any PDB file can be parsed into an AMPAL object, which allows you to easily analyse the structure. The only function you need for this is ampal.load_pdb. It takes a file path string as the input argument:

ampal.load_pdb('3qy1.pdb')
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>

The object that is returned is an Assembly. We can assign it to a variable and look inside it.

my_protein = ampal.load_pdb('3qy1.pdb')

Remember if you are using the object in Jupyter Notebook once it’s assigned to a variable, you can have a look at its attributes and methods by typing my_protein. and then pressing tab.

2. Basic Analysis

This Assembly contains two Polypeptides, and 449 Ligands. It is easy to check the amino acid sequences of the Polypeptides:

my_protein.sequences
['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI',
 'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI']

The .sequences attribute is a list of sequence strings, one for each Polymer. We can determine other basic properties of the Assembly:

my_protein.molecular_weight
48508.931580000004
my_protein.molar_extinction_280
83640
my_protein.isoelectric_point
5.400000000000004
my_protein.id
'3qy1'

3. Selecting Chains

Items inside each Assembly object can be accessed analogously to accessing items in a standard Python list:

my_protein[0]  # The first chain
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>

You can also select a Polymer using a string of the chain id from the PDB file. In this case there are two chains ‘A’ and ‘B’.

my_protein['A']
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_protein['B']
<Polypeptide containing 216 Residues. Sequence: KDIDTLISNNAL...>

The Polypeptide object has a lot of the same functionality as the Assembly:

my_chain_a = my_protein['A']
my_chain_a.molecular_weight
24199.38728
my_chain_a.molar_extinction_280
41820
my_chain_a.isoelectric_point
5.400000000000004
my_chain_a.id
'A'

4. Selecting Residues

Each Polypeptide object is made from one or more Residue objects. You can access the Residues using square brackets:

my_chain_a[0]
<Residue containing 8 Atoms. Residue code: ASP>
my_chain_a[4]
<Residue containing 8 Atoms. Residue code: LEU>
my_chain_a[20]
<Residue containing 7 Atoms. Residue code: PRO>

You can use a string of a residue id from the PDB file to select a Residue:

my_chain_a['23']
<Residue containing 7 Atoms. Residue code: PRO>
my_chain_a['40']
<Residue containing 8 Atoms. Residue code: ILE>

If you use a residue number that isn’t defined in the PDB a KeyError will be raised:

my_chain_a['2']
---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

<ipython-input-22-52706ed7e5b3> in <module>()
----> 1 my_chain_a['2']


~/anaconda3/lib/python3.6/site-packages/ampal/protein.py in __getitem__(self, item)
    190         if isinstance(item, str):
    191             id_dict = {str(m.id): m for m in self._monomers}
--> 192             return id_dict[item]
    193         elif isinstance(item, int):
    194             return self._monomers[item]


KeyError: '2'
my_residue_A23 = my_chain_a['23']

Residues contain an OrderedDict (a special type of dictionary that retains the order you add elements) which has atom identifiers and Atom objects all the atoms that make up the Residue.

my_residue_A23.atoms
OrderedDict([('N',
              <Nitrogen Atom (N). Coordinates: (22.124, -4.140, -35.654)>),
             ('CA',
              <Carbon Atom (CA). Coordinates: (22.664, -3.954, -34.292)>),
             ('C', <Carbon Atom (C). Coordinates: (21.911, -2.875, -33.515)>),
             ('O', <Oxygen Atom (O). Coordinates: (21.863, -2.926, -32.283)>),
             ('CB',
              <Carbon Atom (CB). Coordinates: (24.120, -3.555, -34.534)>),
             ('CG',
              <Carbon Atom (CG). Coordinates: (24.124, -2.964, -35.917)>),
             ('CD',
              <Carbon Atom (CD). Coordinates: (23.118, -3.764, -36.681)>)])

5. Selecting Atoms

Atoms can be selected using a string of their PDB atom type, for example the C\(\alpha\) atom of the residue can be selected like this:

my_residue_A23['CA']
<Carbon Atom (CA). Coordinates: (22.664, -3.954, -34.292)>
my_residue_A23['CG']
<Carbon Atom (CG). Coordinates: (24.124, -2.964, -35.917)>
my_residue_A23['N']
<Nitrogen Atom (N). Coordinates: (22.124, -4.140, -35.654)>
my_atom_A23ca = my_residue_A23['CA']

The individual coordinates can be selected using square brackets:

my_atom_A23ca[0]
22.664
my_atom_A23ca[2]
-34.292

Or with the x, y and z properties:

my_atom_A23ca.x
22.664
my_atom_A23ca.y
-3.954
my_atom_A23ca.z
-34.292

The Atom object contains some useful attributes:

my_atom_A23ca.id  # The atom number from the PDB file
162
my_atom_A23ca.element  # The element of the atom
'C'

6. AMPAL Parents

Hopefully you can see that it’s easy to traverse down the AMPAL framework from Assembly level to the Atom level, but it’s just as easy to work your way back up. With any AMPAL object you can use the parent attribute to find the AMPAL object that it is contained inside.

my_atom_A23ca.parent
<Residue containing 7 Atoms. Residue code: PRO>
my_residue_A23.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_chain_a.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>

This attribute returns the original object itself, meaning you can access all its methods and functions, including its own ampal_parent!

my_atom_A23ca.parent == my_residue_A23
True
my_residue_A23.parent == my_chain_a
True
my_atom_A23ca.parent.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>
my_atom_A23ca.parent.parent.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
my_residue_A23.parent.parent
<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>
my_atom_A23ca.parent.id
'23'
my_residue_A23.parent.sequence
'DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI'
my_chain_a.parent.sequences
['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI',
 'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI']

7. Ligands

The last AMPAL objects to discuss are Ligand and Ligands. These are intended to store non-protein elements from the PDB file. The ligands can be extracted from the Assembly:

my_protein.get_ligands()
<Ligands chain containing 449 Ligands>

Ligands is a special Polymer object, with none of the protein-specific Polypeptide functionality. It contains one or more Ligand objects which you can select in exactly the same way as selecting Residues from Polypeptides:

my_ligands = my_protein.get_ligands()
my_ligands[0]
<Ligand containing 1 Atom. Ligand code: ZN>
my_ligands['221']
<Ligand containing 1 Atom. Ligand code: ZN>

The Ligand objects are Monomer objects, without the protein-specific functionality that is present for Residues. Since Ligand and Residue are both examples of Monomer, they have a lot of the same functionality:

my_ligand_zinc = my_ligands[0]
my_ligand_zinc.atoms
OrderedDict([('ZN',
              <Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>)])
my_ligand_zinc['ZN']
<Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>

This zinc atom is associated with one of the Polypeptide chains, and this is reflected in its ampal_parent.

my_ligand_zinc.parent
<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>

8. Summary and activities

With these simple methods you can load in a PDB file and select various different parts of the protein. Please try playing around with the example code and try to select different parts of the protein.

  1. Try loading in a PDB file of your own and select various parts of the protein and ligands.

  2. Find the other builtin functions either by:

    1. Tabbing the object in Jupyter Notebook

    2. Looking at the documentation

    3. Finding the base_ampal code in the ISAMBARD folder and looking through it (tip: you can do this with the IPython file browser)

In the next section we’ll look at how we can perform more complex selections and more detailed analysis on these objects.