Difference between revisions of "Pathway Gap Filling Post-CHECKSPMUTATE"

From Docswiki
Jump to navigation Jump to search
Line 152: Line 152:
   
 
In pathdata, stick with the READMIN (as there isn't actually a READTS keyword available). This is not a big problem - all that we need to do after running the READMIN calculation is to rename points.min to points.ts and min.data to ts.data.
 
In pathdata, stick with the READMIN (as there isn't actually a READTS keyword available). This is not a big problem - all that we need to do after running the READMIN calculation is to rename points.min to points.ts and min.data to ts.data.
  +
  +
=== all_opt_TSs ===
  +
  +
==== Rationale ====
  +
  +
  +
  +
==== Files Required ====
  +
  +
  +
==== Execution ====

Revision as of 15:05, 3 June 2020

Introduction

This is a recommended procedure to be used following the use of CHECKSPMUTATE, if it was a pathway which was being reoptimised.

CHECKSPMUTATE mutates a selected set of residues in a protein or protein+ligand system, and reoptimises all of the stationary points from the original system. Thus mutated forms or a close homologue can be directly compared against a wild type protein. This is particularly useful when comparing a particular protein fold or protein+cofactor interaction. In these instances, we are interested in reoptimising only the stationary points comprising a particular pathway, and the database before mutation is set up accordingly.

It is almost inevitable (particularly is we are introducing bulky mutations) that not all of the stationary points post-mutation will reoptimise (there could be steric clashes etc). Thus, there will be gaps in our new, mutated pathway. Hence the need for post-processing to fill these gaps.

Please note this method listed below is highly idiosyncratic, and as such is only meant as a loose guide. It uses very simple bash scripts, which can be easily edited. Please feel free to adapt the procedure to your own needs/preferences.

Method

The directories used for CHECKSPMUTATE and its post-processing. The bash scripts are set up to move between these, so will need to be adapted if the directories are named/organised differently.

Error creating thumbnail: Unable to save thumbnail to destination

Checkmin/Checkts

Rationale

Ordinarily, I will have run CHECKSPMUTATE calculations in checkmin and checkts directories respectively. Because of the way OPTIM jobs are assigned by PATHSAMPLE - with each job being assigned a random number - it is possible that two or more OPTIM jobs get assigned the same random number within the same PATHSAMPLE batch. Therefore, the former job gets overwritten by the latter. This seems to be a fairly significant bug within PATHSAMPLE but nobody else seems to have had a problem with it before (I can only assume nobody else has run into this problem, or have come up with their own workarounds). I didn't want to tamper with the cycle2.f90 routine and so my fix involves optimising again these overwritten files. Typically, the number of overwritten files is small compared to the overall number of reoptimisation first conducted by CHECKSPMUTATE. For example, with my [wt ChuS + haem + NADH] system (please see CHECKSPMUTATE for details), of the 1235 minima which were reoptimised, it was found that 14 of these had been overwritten.

Files Required

To find out which files had been overwritten in the first place, a sub-directory (called all_launched_simult) was created within checkmin. The following files from checkmin were copied into this new folder:

  • aa_ringdata.pyc, amino_acids.pyc, atomnumberlog, coordinates_mut.pyc, coords.inpcrd, coords.mdcrd, coords.prmtop, min.A, min.B, min.data, min.in, mutate_aa.py, newreslog, nresidueslog, odata.checksp, odata (exactly the same as odata.checksp), original_protein.pdb, pathsample_checkmin.out, perm.allow, points.min, points.ts, resnumberlog, ts.data

Additionally, pre_pathdata and pre_sub_script_CUDAOPTIM files of the form:

EXEC           /home/adk44/bin/CUDAOPTIM_ppt_final_210918
CPUS           1
NATOMS         5501
NATOMS_CHAIN   5357
NATOMS_NEW     5464
CHECKSP_MUT
SEED           1
DIRECTION      AB
CONNECTIONS    1
TEMPERATURE    0.592
PLANCK         9.536D-14
DUMMYRUN
PERMDIST
ETOL           8D-4
GEOMDIFFTOL    0.2D0
ITOL           0.1D0
NOINVERSION
NOFRQS
CYCLES 1

AMBER12

and

#!/bin/bash

# Request 1 TITAN Black GPU - use '--constraint=teslak20' for a Tesla or '--constraint=maxwell' to request a Maxwell GPU for single precision runs
#SBATCH --constraint=titanblack
#SBATCH --job-name=test_top
#SBATCH --gres=gpu:2
#SBATCH --mail-type=FAIL

hostname
echo "Time: `date`"
source /etc/profile.d/modules.sh

# Load the appropriate compiler modules on the node - should be the same as those used to compile the executable on pat
module add cuda/6.5
module add icc/64/2013_sp1/4/211
module add anaconda/python2/2.2.0 # Needed for python networkx module - must be python 2, not 3

# Set the GPU to exclusive process mode
sudo nvidia-smi -i $CUDA_VISIBLE_DEVICES -c 3

# Run the executable in the local node scratch directory

echo Finished at `date`

were included.

Before proceeding, we also required duplicates.sh, duplicates.py, duplicates2.py and reoptimise.sh, all of which can be found in /svn/SCRIPTS/CHECKSPMUTATE/all_launched_simult/minima

Execution

First, execute duplicates.sh. This generates checkminfile, a list of all of the minima which were overwritten during the original CHECKSPMUTATE run. It identifies such minima by reading pathsample_checkmin.out (i.e. the output from the CHECKSPMUTATE calculation), which gives a log of all of the random numbers each respective OPTIM job was assigned.

The script reoptimise.sh is then used to reoptimise these overwritten minima. pre_pathdata and sub_script_CUDAOPTIM are first manipulated to ensure the correct minima are reoptimised. Each reoptimisation is carried out in a sub-directory named after the index of the minimum being reoptimised.

Note on checkts

Because of slightly different requirements, make sure that the auxiliary files from /svn/SCRIPTS/CHECKSPMUTATE/all_launched_simult/TSs are used instead. Also, the file to be read in by duplicates.sh should be called pathsample_checkts.out rather than pathsample_checkmin.out.

Readmin/Readts

Rationale

Now that we've reoptimised all of the stationary points to our new mutated system/homolgue (bearing in mind that not all will have converged) as well as reoptimised any overwritten ones, we now need to create points.min, min.data, points.ts and ts.data files for our new system. The READMIN keywords can do this by reading in a list of coordinates for all of the reoptimised minima/TSs. Before doing that, we need to create such a file by concatenating all of the min.data.info.**** files into two large min.data.info.total files (one for minima, one for TSs - it is a quirk of the CHECKTS keyword that it also logs its optimised structures in min.data.info.**** files rather than ts.data.info.**** files). I like to keep my minima/TSs in the same order that their equivalents from the original, non-mutated pathway were in, and so concatenate these files in a specific way, whilst ensuring that those from the overwritten reoptimisations are also included.

Files Required

From checkmin/all_launched_simult, copy the following files to the readmin directory:

  • checkminfile, coords.inpcrd, coords.mdcrd, coords.prmtop, min.in, pathsample_checkmin.out, perm.allow

From checkmin, copy all of the min.data.info.**** files to the readmin directory.

Also, required:

A pathdata file of the form:

EXEC           /home/adk44/bin/CUDAOPTIM_ppt_final_210918
CPUS           1
NATOMS         5464
SEED           1
DIRECTION      AB
CONNECTIONS    1
TEMPERATURE    0.592
PLANCK         9.536D-14

PERMDIST
ETOL           8D-4
GEOMDIFFTOL    0.2D0
ITOL           0.1D0
NOINVERSION
NOFRQS

READMIN min.data.info.total

AMBER12

And the bash script, organise_mindatinfo_min.sh, to be found in /svn/SCRIPTS/CHECKSPMUTATE/readmin

Execution

First, the organise_mindatainfo_min.sh file is executed. This gives the min.data.info.total file

Then, execute a PATHSAMPLE binary to run READMIN. This shall give you min.data and points.min files for your new, mutated system.

Notes on readts

This is the same as for the readmin procedure above but with a few distinctions. First, carry out the calculations in the folder readts.

Rather than checkminfile and pathsample_checkmin.out, we need the files checktsfile and pathsample_checkts.out

Rather than organise_mindatainfo_min.sh, use the script organise_mindatainfo_ts.sh (to be found in /svn/SCRIPTS/CHECKSPMUTATE/readts).

In pathdata, stick with the READMIN (as there isn't actually a READTS keyword available). This is not a big problem - all that we need to do after running the READMIN calculation is to rename points.min to points.ts and min.data to ts.data.

all_opt_TSs

Rationale

Files Required

Execution