An Introduction to the AMPAL Framework ====================================== ISAMBARD represents proteins with Python objects that are part of the AMPAL module. AMPAL stands for Atom, Monomer, Polymer, Assembly, Ligand; which are the main functional elements of the framework. These objects are designed to represent the hierarchical nature of proteins in an intuitive way. All protein models produced by ISAMBARD are AMPAL objects, but you can also load in crystal structures. **Q. So why aren’t these objects just called Protein, Chain and Residue?**\ Well, there are ``Polypeptide``, and ``Residue`` objects, but they are just protein-specific versions of ``Polymer`` and ``Monomer``. We wanted to keep the base objects as generic as possible to allow other biomolecules (like DNA or RNA), or even unnatural polymers (like :math:`\beta`-amino acids) to be represented using this architecture. While some these features are not currently implemented in ISAMBARD, this will lead to more scalable code with reduced duplication in the future. 1. Converting PDB files to AMPAL Objects ---------------------------------------- .. code:: ipython3 import ampal Any PDB file can be parsed into an AMPAL object, which allows you to easily analyse the structure. The only function you need for this is ``ampal.load_pdb``. It takes a file path string as the input argument: .. code:: ipython3 ampal.load_pdb('3qy1.pdb') .. parsed-literal:: The object that is returned is an ``Assembly``. We can assign it to a variable and look inside it. .. code:: ipython3 my_protein = ampal.load_pdb('3qy1.pdb') Remember if you are using the object in Jupyter Notebook once it’s assigned to a variable, you can have a look at its attributes and methods by typing ``my_protein.`` and then pressing tab. 2. Basic Analysis ----------------- This ``Assembly`` contains two ``Polypeptides``, and 449 ``Ligands``. It is easy to check the amino acid sequences of the ``Polypeptides``: .. code:: ipython3 my_protein.sequences .. parsed-literal:: ['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI', 'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI'] The ``.sequences`` attribute is a list of sequence strings, one for each ``Polymer``. We can determine other basic properties of the ``Assembly``: .. code:: ipython3 my_protein.molecular_weight .. parsed-literal:: 48508.931580000004 .. code:: ipython3 my_protein.molar_extinction_280 .. parsed-literal:: 83640 .. code:: ipython3 my_protein.isoelectric_point .. parsed-literal:: 5.400000000000004 .. code:: ipython3 my_protein.id .. parsed-literal:: '3qy1' 3. Selecting Chains ------------------- Items inside each ``Assembly`` object can be accessed analogously to accessing items in a standard Python list: .. code:: ipython3 my_protein[0] # The first chain .. parsed-literal:: You can also select a ``Polymer`` using a string of the chain id from the PDB file. In this case there are two chains ‘A’ and ‘B’. .. code:: ipython3 my_protein['A'] .. parsed-literal:: .. code:: ipython3 my_protein['B'] .. parsed-literal:: The ``Polypeptide`` object has a lot of the same functionality as the ``Assembly``: .. code:: ipython3 my_chain_a = my_protein['A'] .. code:: ipython3 my_chain_a.molecular_weight .. parsed-literal:: 24199.38728 .. code:: ipython3 my_chain_a.molar_extinction_280 .. parsed-literal:: 41820 .. code:: ipython3 my_chain_a.isoelectric_point .. parsed-literal:: 5.400000000000004 .. code:: ipython3 my_chain_a.id .. parsed-literal:: 'A' 4. Selecting Residues --------------------- Each ``Polypeptide`` object is made from one or more ``Residue`` objects. You can access the ``Residues`` using square brackets: .. code:: ipython3 my_chain_a[0] .. parsed-literal:: .. code:: ipython3 my_chain_a[4] .. parsed-literal:: .. code:: ipython3 my_chain_a[20] .. parsed-literal:: You can use a string of a residue id from the PDB file to select a ``Residue``: .. code:: ipython3 my_chain_a['23'] .. parsed-literal:: .. code:: ipython3 my_chain_a['40'] .. parsed-literal:: If you use a residue number that isn’t defined in the PDB a ``KeyError`` will be raised: .. code:: ipython3 my_chain_a['2'] :: --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 my_chain_a['2'] ~/anaconda3/lib/python3.6/site-packages/ampal/protein.py in __getitem__(self, item) 190 if isinstance(item, str): 191 id_dict = {str(m.id): m for m in self._monomers} --> 192 return id_dict[item] 193 elif isinstance(item, int): 194 return self._monomers[item] KeyError: '2' .. code:: ipython3 my_residue_A23 = my_chain_a['23'] ``Residues`` contain an ``OrderedDict`` (a special type of ``dictionary`` that retains the order you add elements) which has atom identifiers and ``Atom`` objects all the atoms that make up the ``Residue``. .. code:: ipython3 my_residue_A23.atoms .. parsed-literal:: OrderedDict([('N', ), ('CA', ), ('C', ), ('O', ), ('CB', ), ('CG', ), ('CD', )]) 5. Selecting Atoms ------------------ Atoms can be selected using a string of their PDB atom type, for example the C\ :math:`\alpha` atom of the residue can be selected like this: .. code:: ipython3 my_residue_A23['CA'] .. parsed-literal:: .. code:: ipython3 my_residue_A23['CG'] .. parsed-literal:: .. code:: ipython3 my_residue_A23['N'] .. parsed-literal:: .. code:: ipython3 my_atom_A23ca = my_residue_A23['CA'] The individual coordinates can be selected using square brackets: .. code:: ipython3 my_atom_A23ca[0] .. parsed-literal:: 22.664 .. code:: ipython3 my_atom_A23ca[2] .. parsed-literal:: -34.292 Or with the ``x``, ``y`` and ``z`` properties: .. code:: ipython3 my_atom_A23ca.x .. parsed-literal:: 22.664 .. code:: ipython3 my_atom_A23ca.y .. parsed-literal:: -3.954 .. code:: ipython3 my_atom_A23ca.z .. parsed-literal:: -34.292 The ``Atom`` object contains some useful attributes: .. code:: ipython3 my_atom_A23ca.id # The atom number from the PDB file .. parsed-literal:: 162 .. code:: ipython3 my_atom_A23ca.element # The element of the atom .. parsed-literal:: 'C' 6. AMPAL Parents ---------------- Hopefully you can see that it’s easy to traverse down the AMPAL framework from ``Assembly`` level to the ``Atom`` level, but it’s just as easy to work your way back up. With any AMPAL object you can use the ``parent`` attribute to find the AMPAL object that it is contained inside. .. code:: ipython3 my_atom_A23ca.parent .. parsed-literal:: .. code:: ipython3 my_residue_A23.parent .. parsed-literal:: .. code:: ipython3 my_chain_a.parent .. parsed-literal:: This attribute returns the original object itself, meaning you can access all its methods and functions, including its own ``ampal_parent``! .. code:: ipython3 my_atom_A23ca.parent == my_residue_A23 .. parsed-literal:: True .. code:: ipython3 my_residue_A23.parent == my_chain_a .. parsed-literal:: True .. code:: ipython3 my_atom_A23ca.parent.parent .. parsed-literal:: .. code:: ipython3 my_atom_A23ca.parent.parent.parent .. parsed-literal:: .. code:: ipython3 my_residue_A23.parent.parent .. parsed-literal:: .. code:: ipython3 my_atom_A23ca.parent.id .. parsed-literal:: '23' .. code:: ipython3 my_residue_A23.parent.sequence .. parsed-literal:: 'DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI' .. code:: ipython3 my_chain_a.parent.sequences .. parsed-literal:: ['DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI', 'KDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI'] 7. Ligands ---------- The last AMPAL objects to discuss are ``Ligand`` and ``Ligands``. These are intended to store non-protein elements from the PDB file. The ligands can be extracted from the ``Assembly``: .. code:: ipython3 my_protein.get_ligands() .. parsed-literal:: ``Ligands`` is a special ``Polymer`` object, with none of the protein-specific ``Polypeptide`` functionality. It contains one or more ``Ligand`` objects which you can select in exactly the same way as selecting ``Residues`` from ``Polypeptides``: .. code:: ipython3 my_ligands = my_protein.get_ligands() .. code:: ipython3 my_ligands[0] .. parsed-literal:: .. code:: ipython3 my_ligands['221'] .. parsed-literal:: The ``Ligand`` objects are ``Monomer`` objects, without the protein-specific functionality that is present for ``Residues``. Since ``Ligand`` and ``Residue`` are both examples of ``Monomer``, they have a lot of the same functionality: .. code:: ipython3 my_ligand_zinc = my_ligands[0] .. code:: ipython3 my_ligand_zinc.atoms .. parsed-literal:: OrderedDict([('ZN', )]) .. code:: ipython3 my_ligand_zinc['ZN'] .. parsed-literal:: This zinc atom is associated with one of the ``Polypeptide`` chains, and this is reflected in its ``ampal_parent``. .. code:: ipython3 my_ligand_zinc.parent .. parsed-literal:: 8. Summary and activities ------------------------- With these simple methods you can load in a PDB file and select various different parts of the protein. Please try playing around with the example code and try to select different parts of the protein. 1. Try loading in a PDB file of your own and select various parts of the protein and ligands. 2. Find the other builtin functions either by: 1. Tabbing the object in Jupyter Notebook 2. Looking at the documentation 3. Finding the ``base_ampal`` code in the ISAMBARD folder and looking through it (tip: you can do this with the IPython file browser) In the next section we’ll look at how we can perform more complex selections and more detailed analysis on these objects.