Evaluation Package

This module contains a range of metrics that can be used to evaluate the quality of your models. Some of them are directly implemented by us, others are interfaces to external programs.

Submodules

isambard.evaluation.contact_order module

Module for evaluating the contact order of polypeptides.

isambard.evaluation.contact_order.calculate_contact_order(polypeptide)[source]

Calculates the contact order of a polypeptide.

Contact order is a is a measure of the number and range of contacts found in a protein normalised by sequence length [1]. For proteins with folding pathways that can be approximated as two state, contact order is linearly related to \(\ln{K}\) [1] [2]. Contact order is calculated using the following method:

\[CO = \frac{1}{LN}\sum\limits_{}^{N}\Delta{}Z_{i,j}\]

Where N is the total number of contacts, L is the sequence length and \(\Delta{}Z_{i,j}\) is the sequence distance between contacting residues.

References

[1](1, 2) Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the refolding rates of single domain proteins, J. Mol Biol, 277, 985-994.
[2]Fersht AR (2000) Transition-state structure as a unifying basis in protein-folding mechanisms: Contact order, chain topology, stability, and the extended nucleus mechanism, Proc Natl Acad Sci, 97, 1525-1529.

Notes

I’ve used 18 A for the Ca cut off distance to be very cautious about throwing away interactions. The distance between the amine and Ca of fully extended lysine is around 6.5 A, so if two lysines were interacting, it’d be 2*6.5 plus 2 times van der Waals radius, so around 17 A.

isambard.evaluation.dssp module

This module provides an interface to the program DSSP.

For more information on DSSP see [4].

References

[4]Kabsch W, Sander C (1983) “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features”, Biopolymers, 22, 2577-637.
isambard.evaluation.dssp.dssp_available()[source]

True if mkdssp is available on the path.

isambard.evaluation.dssp.extract_all_ss_dssp(in_dssp, path=True)[source]

Uses DSSP to extract secondary structure information on every residue.

Parameters:
  • in_dssp (str) – Path to DSSP file.
  • path (bool, optional) – Indicates if pdb is a path or a string.
Returns:

dssp_residues

Each internal list contains:

[0] int Residue number [1] str Secondary structure type [2] str Chain identifier [3] str Residue type [4] float Phi torsion angle [5] float Psi torsion angle [6] int dssp solvent accessibility

Return type:

[tuple]

isambard.evaluation.dssp.find_ss_regions(dssp_residues)[source]

Separates parsed DSSP data into groups of secondary structure.

Notes

Example: all residues in a single helix/loop/strand will be gathered into a list, then the next secondary structure element will be gathered into a separate list, and so on.

Parameters:dssp_residues ([tuple]) –
Each internal list contains:
[0] int Residue number [1] str Secondary structure type [2] str Chain identifier [3] str Residue type [4] float Phi torsion angle [5] float Psi torsion angle [6] int dssp solvent accessibility
Returns:fragments – Lists grouped in continuous regions of secondary structure. Innermost list has the same format as above.
Return type:[[list]]
isambard.evaluation.dssp.run_dssp(pdb, path=True)[source]

Uses DSSP to find helices and extracts helices from a pdb file or string. :param pdb: Path to pdb file or string. :type pdb: str :param path: Indicates if pdb is a path or a string. :type path: bool, optional

Returns:dssp_out – Std out from DSSP.
Return type:str
isambard.evaluation.dssp.tag_dssp_data(assembly)[source]

Adds output data from DSSP to each residue in an Assembly.

A dictionary will be added to tags called dssp_data, which contains the secondary structure definition, solvent accessibility phi and psi values from DSSP.

The tags are added in place, so nothing is returned from this function.

Parameters:assembly (ampal.Assembly) – An Assembly containing some protein.

isambard.evaluation.hydrophobic_fitness module

Module for calculating the hydrophobic fitness of a protein.

isambard.evaluation.hydrophobic_fitness.calculate_hydrophobic_fitness(assembly)[source]

Calculates the hydrophobic fitness of a protein.

Hydrophobic fitness is an efficient centroid-based method for calculating the packing quality of your structure [3]. For this method C, F, I, L, M, V, W and Y are considered hydrophobic. The algorithm has two terms:

\[Hydrophobic\ term = \frac{\sum\limits_{i}^{}(H_{i}-H_{i}^{\circ})}{n}\]

where \(H_{i}\) is the total number of number of hydrophobic contacts of residue i, \(H_{i}^{\circ}\) is the number of hydrophobic contacts expected by chance (1) and n is the total number of residues.

\[Burial\ term = \frac{\sum\limits_{i}B_{i}}{n}\]

where \(B_{i}\) is the number of centroids within 10 A. The number of hydrophobic contacts expected by chance is calculated as follows:

(1)\[H_{i}^{\circ} = C_{i}\left(\frac{h_{i}}{N_{i}}\right)\]

where \(C_{i}\) is the number of all side-chain centroids that contact residue i, h is the total number of hydrophobic residues in the sequence except for any neighbours. \(h_{i}\) is the total number of residue minus i and neighbours. The hydrophobic fitness score is the combination of these terms:

\[HF = -\frac{\left(\sum\limits_{i}B_{i}\right)\left(\sum\limits_{ i}^{}(H_{i}-H_{i}^{\circ})\right)}{n^{2}}\]

Notes

WARNING: The scores produced by this implementation do not quite match scores listed in publications from the Levitt group. The scores are generally off by up to around 1 unit, and so it should still be useful.

References

[3]Huang ES, Subbiah S and Levitt M (1995) Recognizing Native Folds by the Arrangement of Hydrophobic and Polar Residues, J. Mol. Biol return., 252, 709-720.
isambard.evaluation.hydrophobic_fitness.get_number_within(reference, target_points)[source]

Get the number of points within 10 and 7.3 residue.

Parameters:
  • reference ((str, int, (float, float, float))) – Reference centroid.
  • target_points ([(str, int, (float, float, float))]) – A list of centroids.
Returns:

within_and_neighbours – Returns the number of target centroids within 7.3 A, 10.0 A and the number of neighbours (based on chain ID and residue number).

Return type:

(int, int, int)

isambard.evaluation.hydrophobic_fitness.run_hf_loop(hydrophobic_centroids, tyrosine_centroids, polar_centroids)[source]

Runs the hydrophobic fitness algorithm.

Parameters:
  • hydrophobic_centroids ([(str, int, (float, float, float))]) – A list containing the chain ID, residue number and the centroid coordinate position for all the hydrophobic residues exclusing tyrosine.
  • tyrosine_centroids ([(str, int, (float, float, float))]) – A list containing the chain ID, residue number and the centroid coordinate position for all the tyrosine residues.
  • polar_centroids ([(str, int, (float, float, float))]) – A list containing the chain ID, residue number and the centroid coordinate position for all the polar residues.
Returns:

hydrophobic_fitness – The hydrophobic fitness score.

Return type:

float