CHECKSPMUTATE

From Docswiki
Jump to navigation Jump to search

Purpose

It can take an awfully long time to create a large database or to fully optimise a specific feature of it such as a fully connected pathway showing complex protein folding. This is particularly acute when considering large proteins/protein+ligand systems.

What if we are interested in examining how a Wild Type protein behaves with respect to some carefully selected mutants? Or in comparing one protein against a close homologue? It would seem like a colossal waste of time to create large databases for each of these similar cases, completely independently from each other. This is where CHECKSPMUTATE comes in.

CHECKSPMUTATE uses the CHECKSPODATA routine, which reoptimises the minima/transition states of a database. CHECKSPMUTATE extends this by allowing for user-selected sections of the coordinates of the stationary points comprising the database to be mutated before the reoptimisation takes place. Thus a database can be transformed so that it describes the behaviour of a mutated protein as opposed to the Wild Type. Though this new database will need to be tidied up through the use of, eg, SHORTCUT, SHORTCUT 2 BARRIER and UNTRAP, this process should be far quicker than starting a whole new database from scratch. In the example below, I shall show how a pathway describing the approach of the cofactor, NADH, towards another cofactor, haem, within the pocket of HemS (a pathway which took months to find and fully connect) could be quickly replicated in a system where the wt HemS has been replaced by a mutated form (or even by another protein entirely).

Preparation

Before running the reoptimisations, we need to prepare a series of auxiliary files. In all, we should have the following files in our directory. Note that the python scripts can be found in /svn/SCRIPTS/AMBER/BHmutation_steps.

  • aa_ringdata.pyc list of parameters/definition of planes for residues with rings. Only required if we are mutating to a residue with a ring.
  • amino_acids.pyc list of parameters for all residues.
  • atomnumberlog list of indices of the first atom of the residues to be mutated.
  • coordinates_mut.pyc script which mutates the selected residue.
  • coords.inpcrd ensure this is for the system we are CHANGING TO.
  • coords.mdcrd for use with min.in, may not be required.
  • coords.prmtop ensure this is for the system we are CHANGING TO.
  • min.A needs to be present, although the index listed in it is unimportant
  • min.B needs to be present, although the index listed in it is unimportant
  • min.data ensure this is for the system we are CHANGING FROM. Could be an entire database, or a section of it (such as a pathway) found using DIJKSTRA.
  • min.in for use with AMBER. Defines certain aspects of the model, such as solvent being used.
  • mutate_aa.py organises the residues to be mutated, and does the mutations in conjunction with coordinates_mut.pyc
  • newreslog list of codes for residues which we are CHANGING TO.
  • nresidueslog list of the total number of mutations to be made to the system.
  • odata.checksp list of conditions for optimisation carried out on each stationary point by OPTIM.
  • oldreslog list of codes for residues which we are CHANGING FROM.
  • original_protein.pdb pdb file for the system we are CHANGING FROM. All lines which do not dscribe an ATOM (e.g. TITLE, TER and END) are removed, so that the number of lines of the file should correspond to the number of atoms in the system.
  • pathdata organises OPTIM jobs. Certain keywords are required, described below.
  • perm.allow full description of the groups of permutable atoms in the system we are CHANGING TO.
  • points.min ensure this is for the system we are CHANGING FROM. Could be an entire database, or a section of it (such as a pathway) found using DIJKSTRA.
  • points.ts ensure this is for the system we are CHANGING FROM. Could be an entire database, or a section of it (such as a pathway) found using DIJKSTRA.
  • resnumberlog list of indices of the residues to be mutated.
  • ts.data ensure this is for the system we are CHANGING FROM. Could be an entire database, or a section of it (such as a pathway) found using DIJKSTRA.
  • submission_script script for executing your binary.

Example of Mutation

[Wild Type HemS + haem + NADH] to [F101A HemS + haem + NADH]

To use CHECKSPMUTATE, we first need a database of interest, or a subset of it. In my example, I had extracted the minima and transition states comprising the pathway I was interested in using DIJKSTRA and moved the new min.data, points.min, points.ts and ts.data files to a new directory. Therefore, each of the stationary points in my database described a stage along this pathway. I wanted to see how this pathway, describing the approach of NADH to haem within the wt HemS pocket, changed when certain mutations were made to the HemS structure. One such residue of interest was a phe-gate (which appeared to regulate the approach of NADH) and so a mutation from phenylalanine to alanine (F101A) was made. I made input files for the new mutated system using tleap, elsewhere described in Preparing an AMBER topology file for a protein plus ligand system and Symmetrising AMBER topology files. This gave me coords.inpcrd, coords.prmtop and perm.allow files for my new system. These were moved to the same directory where I had the min.data, points.min, points.ts and ts.data files for the original system.

Making auxiliary files

We need to prepare atomnumberlog, newreslog, nresidueslog, oldreslog and resnumberlog to define the mutation we are making, and where in the chain it takes place. In this example, we are only making one mutation - from phenylalanine to alanine at position 101.

  • nresidueslog as we are only making one mutation, this is simply:
1
  • resnumberlog this is the index of the residue we are mutating. In cases (unlike this one) where the number of residues changes, this index is always with respect to the original system.
101
  • atomnumberlog this is the index of the first atom in the residue we are mutating. In cases (unlike this one) where the number of residues changes, this index is always with respect to the original system.
1569
  • oldreslog this is the residue code for the residue we are mutating from. If only three letters (i.e. not terminal and so preceded by N--- or C---), ensure that a space is put after it so that the code is four characters overall.
PHE 
  • newreslog this is the residue code for the residue we are mutating to. If only three letters (i.e. not terminal and so preceded by N--- or C---), ensure that a space is put after it so that the code is four characters overall.
ALA 

pathdata and odata.checksp files

To invoke the reoptimisation process, either CHECKMIN (if reoptimising the minima of the database) or CHECKTS (if reoptimising the transition states) is required. The arguments should correspond to the number of minima or transition states present in the database.

When mutating, we also need to include CHECKSP_MUT, which invokes the process whereby our system is mutated before being reoptimised. We should also include NATOMS_NEW which is the number of atoms in the system to which we are mutating. Also, if our system includes cofactors (which are conventionally listed after the protein chain) then we should include NATOMS_CHAIN, the number of atoms in the protein chain.

An example pathdata file (for minima, as opposed to TSs) would therefore look something like this:

EXEC           /home/adk44/bin/CUDAOPTIM_ppt_final_210918
CPUS           1
NATOMS         5501
NATOMS_CHAIN   5357
SEED           1
DIRECTION      AB
CONNECTIONS    1
TEMPERATURE    0.592
PLANCK         9.536D-14

PERMDIST
ETOL           8D-4
GEOMDIFFTOL    0.2D0
ITOL           0.1D0
NOINVERSION
NOFRQS

CHECKMIN 1 1235
CYCLES 1235
CHECKSP_MUT
NATOMS_NEW 5491

AMBER12

The odata.checksp files should be no different from what you originally used to find your original database. Make sure though that you use one suitable for finding minima with CHECKMIN, and one suitable for finding TSs with CHECKTS!

Before and After Mutation

Here is a rendering of [wt HemS + haem + NADH], alongside a rendering of the same minimum reoptimised after the F101A mutation was made.

Example of Transformation to a Homologue

[Wild Type HemS + haem + NADH] to [Wild Type ChuS + haem + NADH]

This is a more complex problem than the single-point mutation described above. HemS and Chus are ~70% similar, and so ~30% of the residues of HemS need to be 'mutated' to transform the system to ChuS. In addition to that, ChuS is two residues shorter than HemS, with one such deletion occurring approximately midway through the chain, and the other at the end. As with the mutant example above, I made input files for the new system using tleap, elsewhere described in Preparing an AMBER topology file for a protein plus ligand system and Symmetrising AMBER topology files. This gave me coords.inpcrd, coords.prmtop and perm.allow files for my new system. These were moved to the same directory where I had the min.data, points.min, points.ts and ts.data files for the original system.

Making auxiliary files

First, we need to align and compare the two systems.

wt HemS + haem + NADH

NSERILE TYR GLU GLN TYR LEU GLN ALA LYS ALA ASP ASN PRO GLY LYS TYR ALA ARG ASP
LEU ALA THR LEU MET GLY ILE SER GLU ALA GLU LEU THR HIE SER ARG VAL SER HIE ASP
ALA LYS ARG LEU LYS GLY ASP ALA ARG ALA LEU LEU ALA ALA LEU GLU ALA VAL GLY GLU
VAL LYS ALA ILE THR ARG ASN THR TYR ALA VAL HIE GLU GLN MET GLY ARG TYR GLU ASN
GLN HIE LEU ASN GLY HIE ALA GLY LEU ILE LEU ASN PRO ARG ASN LEU ASP LEU ARG LEU
PHE LEU ASN GLN TRP ALA SER ALA PHE THR LEU THR GLU GLU THR ARG HIE GLY VAL ARG
HIE SER ILE GLN PHE PHE ASP HIE GLN GLY ASP ALA LEU HIE LYS VAL TYR VAL THR GLU
GLN THR ASP MET PRO ALA TRP GLU ALA LEU LEU ALA GLN PHE ILE THR THR GLU ASN PRO
GLU LEU GLN LEU GLU PRO LEU SER ALA PRO GLU VAL THR GLU PRO THR ALA THR ASP GLU
ALA VAL ASP ALA GLU TRP ARG ALA MET THR ASP VAL HID GLU PHE PHE GLN LEU LEU LYS
ARG ASN ASN LEU THR ARG GLN GLN ALA PHE ARG ALA VAL GLY ASN ASP LEU ALA TYR GLN
VAL ASP ASN SER SER LEU THR GLN LEU LEU ASN ILE ALA GLN GLN GLU GLN ASN GLU ILE
MET ILE PHE VAL GLY ASN ARG GLY CYS VAL GLN ILE PHE THR GLY MET ILE GLU LYS VAL
THR PRO HIE GLN ASP TRP ILE ASN VAL PHE ASN GLN ARG PHE THR LEU HIE LEU ILE GLU
THR THR ILE ALA GLU SER TRP ILE THR ARG LYS PRO THR LYS ASP GLY PHE VAL THR SER
LEU GLU LEU PHE ALA ALA ASP GLY THR GLN ILE ALA GLN LEU TYR GLY GLN ARG THR GLU
GLY GLN PRO GLU GLN THR GLN TRP ARG ASP GLN ILE ALA ARG LEU ASN ASN CLYSHEM NAD

wt ChuS + haem + NADH

NASNHIE TYR THR ARG TRP LEU GLU LEU LYS GLU GLN ASN PRO GLY LYS TYR ALA ARG ASP
ILE ALA GLY LEU MET ASN ILE ARG GLU ALA GLU LEU ALA PHE ALA ARG VAL THR HIE ASP
ALA TRP ARG MET HIE GLY ASP ILE ARG GLU ILE LEU ALA ALA LEU GLU SER VAL GLY GLU
THR LYS CYS ILE CYS ARG ASN GLU TYR ALA VAL HIE GLU GLN VAL GLY THR PHE THR ASN
GLN HIE LEU ASN GLY HIE ALA GLY LEU ILE LEU ASN PRO ARG ALA LEU ASP LEU ARG LEU
PHE LEU ASN GLN TRP ALA SER VAL PHE HIE ILE LYS GLU ASN THR ALA ARG GLY GLU ARG
GLN SER ILE GLN PHE PHE ASP HIE GLN GLY ASP ALA LEU LEU LYS VAL TYR ALA THR ASP
ASN THR ASP MET ALA ALA TRP SER GLU LEU LEU ALA ARG PHE ILE THR ASP GLU ASN THR
PRO LEU GLU LEU LYS ALA VAL ASP ALA PRO VAL VAL GLN THR XXX ARG ALA ASP ALA THR
VAL VAL GLU GLN GLU TRP ARG ALA MET THR ASP VAL HID GLN PHE PHE THR LEU LEU LYS
ARG HIE ASN LEU THR ARG GLN GLN ALA PHE ASN LEU VAL ALA ASP ASP LEU ALA CYS LYS
VAL SER ASN SER ALA LEU ALA GLN ILE LEU GLU SER ALA GLN GLN ASP GLY ASN GLU ILE
MET VAL PHE VAL GLY ASN ARG GLY CYS VAL GLN ILE PHE THR GLY VAL VAL GLU LYS VAL
VAL PRO MET LYS GLY TRP LEU ASN ILE PHE ASN PRO THR PHE THR LEU HIE LEU LEU GLU
GLU SER ILE ALA GLU ALA TRP VAL THR ARG LYS PRO THR SER ASP GLY TYR VAL THR SER
LEU GLU LEU PHE ALA HIE ASP GLY THR GLN ILE ALA GLN LEU TYR GLY GLN ARG THR GLU
GLY GLU GLN GLU GLN ALA GLN TRP ARG LYS GLN ILE ALA SER LEU ILE CPROXXX HEM NAD

Here, red font signifies a residue which has been mutated, green represents a residue which has been deleted (given the code name XXX), and purple represents the two cofactors.

We now need to prepare atomnumberlog, newreslog, nresidueslog, oldreslog and resnumberlog to define all the mutations/insertions/deletions we are making, and where in the chain they take place. In this example, we are making 113 mutations/insertions/deletions (to be precise, 111 mutations and 2 deletions) in total:

  • nresidueslog for 113 mutations, this is:
113
  • resnumberlog this is a list of the indices of the residues we are mutating. In cases where the number of residues changes, this index is always with respect to the original system.
1
2
4
.
.
336
337
338
  • atomnumberlog this is a list of the indices of the first atoms in the residues we are mutating. In cases where the number of residues changes, this index is always with respect to the original system.
1
14
54
.
.
5307
5321
5335
  • oldreslog this is the residue code for the residues we are mutating from. If only three letters (i.e. not terminal and so preceded by N--- or C---), ensure that a space is put after it so that the code is four characters overall. If we are inserting a residue then the residue code is XXX or NXXX or CXXX.
NSER
ILE 
GLU 
.
.
ASN 
ASN 
CLYS
  • newreslog this is the residue code for the residues we are mutating to. If only three letters (i.e. not terminal and so preceded by N--- or C---), ensure that a space is put after it so that the code is four characters overall. If we are inserting a residue then the residue code is XXX or NXXX or CXXX.
NASN
HIE 
THR 
.
.
ILE 
CPRO
XXX  

pathdata and odata.checksp files

Same as for the mutation example above. I would expect, for my example here with ChuS, a pathdata file like this:

EXEC           /home/adk44/bin/CUDAOPTIM_ppt_final_210918
CPUS           1
NATOMS         5501
NATOMS_CHAIN   5357
SEED           1
DIRECTION      AB
CONNECTIONS    1
TEMPERATURE    0.592
PLANCK         9.536D-14

PERMDIST
ETOL           8D-4
GEOMDIFFTOL    0.2D0
ITOL           0.1D0
NOINVERSION
NOFRQS

CHECKMIN 1 1235
CYCLES 1235
CHECKSP_MUT
NATOMS_NEW 5464

AMBER12

Final Notes

We therefore now have a method in which to transform the stationary points of a database of interest into a mutant or close homologue. This should allow a method in which to directly compare a wild type against mutants of interest or two protein homologues. In the examples above, I have been interested in comparing the pathway of a cofactor as it enters a pocket against other pathways in which key residues have been removed, and against a homologue of suspected similar reactivity.

The advantage of this method is that time and computational resources can be significantly saved in transforming one database to a system of similar properties.

It must be noted, however, that following the use of CHECKSPMUTATE, the new pathway might have gaps as not all of the stationary points necessarily will have converged. These gaps will need filled with the CONNECTPAIRS keyword. The full procedure that I used to fill such gaps is detailed in Pathway Gap Filling Post-CHECKSPMUTATE

IT MUST ALSO BE NOTED that the pathway for the new system will not necessarily (and is indeed unlikely to be) the optimal one. Therefore, post CHECKSPMUTATE, an appropriate number of rounds of SHORTCUT/SHORTCUT 2 BARRIER/UNTRAP etc will need to be performed. It is not advised that CHECKSPMUTATE be used on proteins which are significantly different.

Care must especially be taken when inserting residues within the chain. If possible to avoid, please do so. I have written the code so that a maximum of two such residues can be inserted at the same point along the chain. This is already likely to cause problematic steric clashes, and so adding more would not likely be feasible. Care must also be taken that these new residues, once added and optimised, have the correct cis-trans isomerism and chirality.

Potential Uses

CHECKSPMUTATE could potentially be used to shed light on some of the following problems:

  • Comparing a wt protein against some mutations to determine how such mutations affect protein folding. Care will need to be taken that sufficient sampling for alternative pathways is performed.
  • Comparing a wt protein against some mutations to determine how such mutations affect protein-cofactor interactions.
  • Comparing a protein against a close homologue to see whether it engages in the same manner of protein folding. Care will need to be taken that sufficient sampling for alternative pathways is performed.
  • Comparing a protein against a close homologue to see whether it engages in similar protein-cofactor interactions.
  • Transformation of a cofactor (for example from NADH to NADPH) to see whether this affects protein-cofactor interactions.
  • Comparing a DNA strand against an RNA/XNA strand. Note that this would require some different python scripts, also found in /svn/SCRIPTS/AMBER/BHmutation_steps, such as nucleic_acids.py.

Acknowledgements

Though adk44 wrote the code for CHECKSPMUTATE, he interfaced this code with kr366's pre-existent python scripts (with minor alterations) which had originally been written to allow mutations as part of kr366's Mutational BH steps routine.

--adk44 16.45, 2 June 2020 (BST)