Mutational BH steps

From Docswiki
Jump to navigation Jump to search

This wiki describes mutational BH steps as they are implemented for AMBER12 within GMIN. The tutorial contains two parts - a description of how to use the mutational steps and an overview of the module, so future changes of the code may be implemented.

Setting up

To run AMBER12 with mutational steps, all usual input files are needed, i.e. coords.inpcrd, coords.prmtop, min.in, start and perm.allow.

There are five keywords for mutations at the moment:

  • AMBERMUTATION nmut ntest - activates mutational steps, a mutation is attempted every nmut steps and after ntest further BH steps the mutation is accpeted or rejected
  • AMBERMUTENERGY mode option - sets the choice of scoring for the mutations
  • AMBERMUTFF ff - sets the force field for creating new topology files, ff is either 14 (ff14SB) or 99 (ff99SB)
  • AMBERMUTIGB igb - sets the choice of solvent model, igb is either 2 (igb=2) or 8 (igb=8)
  • AMBERMUTRIGID - use rigid bodies (do not set RIGIDINIT).

The current scoring functions are:

  • mode=1 - using a random number, for test purposes only
  • mode=2 - use a part of the decomposed AMBER energy, option specifies which one
  • mode=3 - use interaction between two components, option specifies the last residue in the first molecule, the energy difference used is the bound energy versus the energy where the second component is translated
  • mode=4 - RMSD from a reference for the CA atoms (for now only helices are programmed, option specifies 1 - alpha helix, 2 - 3-10 helix, and 3 - pi helix)
  • mode=5 - specifies geometric constraints, option is the number of groups of constraints

A number of additional number of files is required to set up various parts of the interface.

Mutations

The file amber_mutations is needed to specify the allowed mutations. The format of the file is as follows:

3                       %number of residues that can be mutated
4                       %number of terminal residues (N or C for peptides)
1 81 82 90              %ordered list of ID of terminal residues, with indication whether they are NA or AA
83    3   TYR  0        %residue to mutate: ID | number of allowed mutations (must include the current state) | current state | number of mutations done (restart) | amino (AA) or nucleic (NA) acid
TYR  HID  HIE           %list of allowed mutations
 3    1    1            %relative probability of choosing mutation
84   2    ILE  0        %second residue, same format
PHE  ILE
 1    1
89    3   LEU  0        %third residue, same format
ARG  LEU  ILE
 3    2    1

Extra files

Depending on the exact setup, there are more files that may be required.

If the scoring function chosen is option 5, geometrical constraints, then the file amber_mut_constr is needed. There are three possible constraints - distance (D), angle (A) and torsion (T). They can be clustered into groups to be more accurate, for example a G-C base pair would require 3 distance constraints (the Hydrogen bonds) to be formed. An example is given here:

GROUP 1                                    %first group, 1 constraint
D  2.4 0.2 0.3 11 37                       %distance constraint: structural element exists if the distance between atoms 11 and 37 is between 2.4+0.2 and 2.4-0.3
GROUP 2                                    %second group, 2 constraints
A 27.5 1.5 1.5 45 34 90                    %angle constraint
A 56.8 2.6 3.0 45 34 112
GROUP 2
T -174.0 4.5 3.0 33 23 22 11               %torsional constraint
D 2.5 0.1 0.1 33 35

As some proteins exhibit disulfide bonds and these will alter the potential, they need to be specified in the topology file. To specify these bonds an extra file amber_mut_cyx is required, stating the number of bonds in the first line, and then pairs of residues that are bonded in the following lines, e.g.:

4
10 54
13 27
21 44
28 34

Running jobs

The jobs can run normally, but certain options are not available, namely BHPT and FEBH. It is important to make the overall number of steps large and allow for exploration of the mutational conformations before accepting or rejecting them. The setup requires a local python and a local AMBER installation.

The AMBER12_MUTATIONS Module

The module is set up to deal with the fortran aspect of the code. That is to get information from the AMBER interface, reinitialise the interface, update the various global variables and take care of the step-taking. One aspect of the step-taking is handled by a set of python scripts, namely the creation of most input files. As these require an awful amount of book keeping and correctly formatted output, pythons flexible data structures take care of this nicely.

The interface uses 5 global variables: AMBERMUTATIONT (logical, are mutational steps used?), MUTATIONFREQ (integer, every how many steps are we mutating?), MUTTESTSTEPS (integer, how many steps are taken before we reject or accept), NMUTATION (integer, number of mutational steps taken), and MUTUNIT (integer, I/O unit for writing specific log file for mutational steps).

AMBER12 and RIGIDINIT are automatically enabled with the correct AMBERMUT keywords. The programme sets all variables, initialises AMBER12 and chirality checks including the coordinates, and then calls the mutational setup.

Within the module of variables are used throughout and will need to be maintained in any future additions/changes in the code. MUTATION_INFO and PREVIOUS_MUTATION are of custom-made type RESIDUE_MUTATION. Here all information originating from the amber_mutation file are stored and amended as the run proceeds. THE PREVIOUS_MUTATION stores the information needed to revert if a mutation is rejected.

MUTCONSTRAINT and the types CONSTRGROUP and CONSTRAINT are used for mode=5 for the scoring function and store the information from amber_mut_constr.

Setup

The first action within the setup is to analyse the topology file and save atom and residue information. Then the amber_mutation file is opened and processed, including all allocations of arrays used in the module. The size of these should not change again, as we mutate but not delete or add residues. Many of the variables here are for book keeping.

Next a check is done to detect disulfide bond information. If amber_mut_cyx exists, the information is read in, and the logical flags are set to use this information. Finally, the group rotation are initialised. The rigid body setup is handled within the normal routines.

Step-taking

The first thing done in the step taking is storing the current details, that is copying MUTATION_INFO to PREVIOUS_MUTATION, and saving the current lowest minima, so we can access the lowest conformations found for every mutation. Then the next mutation is selected.

Mutation selection

The selection of the mutation occurs in the SELECT_MUTATION subroutine.

For every possible mutation site, we determine the number of previous mutations plus one, and select a site with relative probability of 1 over this number. This means the more mutations we have done in one site, the less likely it is to have another one occurring there.

We then select a mutation from the allowed set with the given probabilities. The new and old three letter codes are saved in NEWRES and OLDRES (for nucleic acids it is the two letter code from the AMBER force field specifications). For amino acids we then add potential terminal identifiers to the names, for nucleic acids this is done according to the name in the python scripts.

Creation of new input files

From the selected residue, we create a new file that contains the coordinates for this residue before the mutation. The old input files are then saved and the mutation scripts mutate_aa.py or mutate_na.py are then called, creating the new coordinates for the mutated residue preserving as much of the geometry as possible (this way we reduce the likelihood of atom clashes in the mutated conformation). A new perm.allow is also created.

After this process, the new topology is created, and at this stage the shift in the number of atoms is entered into the module. Within the same subroutine, the coordinate input files are constructed. Some tricks are necessary as we cannot use the AMBER12 interface routines due to a mismatch in the number of atoms. At this stage, all variable still work with the original number of atoms. Next the group rotation and rigid body input is created.

Reinitialisation and deallocations

Once this has occurred, the REINITIALISE_AMBER() subroutine is called. This routine is rather cruel and information that has not been saved at this stage will be lost. Firstt, all open AMBER files are closed, all internal arrays deallocated , and traces from the previous initialisation are removed. This effectively stops the AMBER part of GMIN and we initialise it from scratch with the new input files. All group rotation information also has to be reset, and the same applies to the genrigid data. All global variables allocated with the original number of atoms are then deallocated and reallocated, and initialised with dummy values to avoid reading from uninitialised memory.

Last but not least, we delete all information about chirality and cis-trans isomers and reinitialise these as well.

Reversing mutations

Broadly we follow the mutational step taking, i.e. we still need all the reallocation and initialisation. However, as the data from the previous sequence are stored (i.e. we have the correct input files), they are just copied into the right place with system calls, removing the need to recreate files fro scratch.

Mutation scoring

The mutations are scored within MUTATION_E, which is called from mc.F. The two parameters used differentiate between the mode of scoring used and the options within the calculation.

Energy based functions

There are currently two energy based scoring functions. The first is a component of the AMBER energy and any of the calculated components may be used. The actual energies are saved in MUTUNIT, and therefore can be used as diagnostics for bugs as the values should go off the chart the moment an error is introduced into the structures or potential files.

The second one is an estimate of binding interactions based on the potential interactions. The system is split into two components. The geometric centres for the CA atoms for each component is determined and the unit vector between these two points computed. The second component is then translated by 15.0 nm along this unit vector. This will move the components outside of all cutoffs. The energy difference between the displaced and the original structure measure the binding energy.

Geometry based functions

These constraints are thought to target specific frameworks or identify bonding patterns. The aim of the BH for such a scoring for example could be: What is the best sequence to support a tetraloop as lowest energy structures.

Accept/Reject criteria

The accept reject criteria are tuned to reflect the different scoring ideas. For energy based measure the usual Metropolis criterion is employed. For the other scoring functions, if the score after mutating is closer to the ideal score, the mutation is accepted. Otherwise a random number is drawn between 0 and 1 and the mutation is accepted if the ratio of new to old is larger.