REMD with AMBER

From CUC3
Revision as of 12:40, 24 October 2008 by import>Csw34 (→‎Clustering structures)
Jump to navigation Jump to search

Setting up AMBER REX

To run REMD with AMBER you need sander.MPI. You can either copy it (clust:~bs360/bin/sander.MPI) or create your own. To do so you have to copy the directory clust:~bs360/MPIAMBER/ and follow the instructions on http://wwmm.ch.cam.ac.uk/wikis/ucc/index.php/Notes_on_dove#64-bit_AMBER9 to create the config.h (or take the existing one in the MPIAMBER/src/ directory and change the path names accordingly). After producing the config.h file you can compile the MPI version by make parallel. You should have loaded the pgi64 and mpi/lam/64/pgi/7.1.1 modules. This version has some modifications made to the Sander source code in the files mdfill.f and multisander.f. They are tagged with 'bs360' to make them easy to find. It is not known if they are all necessary, but it works fine with them!

To run REMD with AMBER you need following input files:

-> numreplicas

  (number of repilcas)

-> groupfile

   -A -i mdin.rep1 -o mdout.rep1 -c inpcrd.rep1 -p ./ala2.prmtop -r restrt.rep1 -x mdcrd.rep1 -inf mdinfo.rep1
   -A -i mdin.rep2 -o mdout.rep2 -c inpcrd.rep2 -p ./ala2.prmtop -r restrt.rep2 -x mdcrd.rep2 -inf mdinfo.rep2
   ...

The names of 'numreplicas' and 'groupfile' can not be changed! All other files you can name as you like.

mdin.rep*, inpcrd.rep* and ala2.prmtop define the other input files; all other files are for output.

The inpcrd.rep* and ala2.prmtop are the usual coordinate and topology files, and mdin.rep* define the input for the MD for each replica:

REMD run 1

&cntrl
       imin = 0, nstlim = 500, dt = 0.002,
       tempi = 0, temp0 = 100.0,
       ntt = 3, tol = 0.000001, gamma_ln = 5.0,
       ntc = 2, ntf = 1, ntb = 0, ntx = 5,
       ntwx = 500, ntwe = 0, ntwr =500, ntpr = 500,
       cut = 99.0, igb = 0, saltcon=0.0,
       nscm = 500, irest=1,
       ntave = 0, numexchg=100000
&end

An exchange is attempted after each cycle. The cycle lenth is determined by (ntslim)x(dt), i.e. (MD steps)x(step) length. ntwx determines after how many MD steps the coordinats are written to mdcrd.rep*. ntpr determines after how many MD steps the energy is written to mdinfo.rep*. ntwr determines after now many MD steps the restart files mdinfo.rep* is updated. The temperatures for each replica you have to determine yourself, i.e, they are not automatically exponentially distributed between <math>T_{min}</math> and <math>T_{max}</math> as in the MMTSB protocol. Note that with irest=1, each MD cycle will restart the trajectory from the relevant restart file. This is why you can have tempi=0, as AMBER reads the temperature from that file and ignores the MD input.

Determining the replica temperatures

As we're not using the MMTSB toolset, you will need to calculate the replica temperatures yourself. This is done as follows with <math>n_{tot}=</math>TOTAL number of replicas, <math>n_{rep}=</math>(current replica number - 1), <math>T_{min}=</math>the lowest temperature used and <math>T_{max}=</math>the highest temperature used.

<math>A</math> is an constant defined as: <math>A=\frac{ln(T_{max}/T_{min})}{n_{tot}-1}</math>

The temperature of replica <math>n+1</math> is then defined as:

<math>T_{n+1}=T_{min}\exp{(An)}</math>

The input.crd files

As you have irest=1 and ntx=5, you need to have velocity information in your inpcrd file to start the REMD simulation. This is done by first running a 1 step CTMD run with the original input, and then copying the velocities from the restart file created. Input for this is below:

 Velocity randomisation
 &cntrl
        imin = 0, nstlim = 1, dt = 0.0015,
        tempi = 0, temp0 = 250.0,
        ntt = 3, gamma_ln = 5.0, ntx = 1,
        ntc = 1, ntf = 1, ntb = 0,
        ntwx = 100, ntwe = 0, ntwr =100, ntpr = 100,
        cut = 99.0, igb = 5, saltcon=0.1,
 &end

In this case you want to run Sander with a command like this:

$AMBERHOME/exe/sander -O -i md.in -o md.out -c 1LE1.inpcrd -p ./1LE1_ff03.prmtop -r restrt -x mdcrd -inf mdinfo

Comparing restrt to the original inpcrd file, you'll notice that it is almost twice as long as before! This is because the velocities for each atom are now appended to the bottom of the file. Simply copy them into your inpcrd file and you're good to start the simulation.

You can find an example for REMD with AMBER in clust:/sharedscratch/bs360/amberrex_test/

Analysing the results

AMBER provides a multitude of tools to analyse MD trajectories, all of which are well documented in the AMBER manual under ptraj. However, before you can use ptraj to analyse any of the output mdcrd.rep* files, you need to make a slight modification. AMBER differentiates between replica exchange and CTMD output by NOT putting in a blank line at the start of REX trajectory files. This means that if you try to load them into ptraj for analysis, you will in fact get complete garbage out. Thankfully, this is easily solved by the addition of a blank line using, for example, the following script:

#!/bin/bash
# Script to add a blank line at the top of each file. This is important as when doing REX
# AMBER uses a different output format (without the blank line) for trajectories. 
# Sadly this means you can't use ptraj, vmd, or anything to read them in until you add it.
for i in 1 2 3 4 5 6 7 8
do
  perl -pi -e 'print "\n" if $. == 1' mdcrd.rep$i
done

Note that if you have large files, this may take a while - but it is less boring than opening each in vi and adding a line at the top!

Clustering structures

If you also have the MMTSB Toolset installed - you can cluster the output of your REX runs quite easily. Clustering provides information on the most populated conformational groups, and makes it easy to pick out common structures from the run without manually trawling through every structure one at a time. Here is an example script that you can use to perform such a clustering separately for each replica in a run. You will need to modify it before you use it due to differing file names and windows over which you are sampling as the MMTSB algorithm can only cope with <50000 structures.

#!/bin/bash
# Script to extract PDBs from an AMBER9 REX run and perform
# a simple clustering analysis for each replica
# You need to edit the numbers and file names 
# when using more replicas or different systems.
# The clustering uses the MMTSB script kclust and as a result,
# MMTSB must be installed and working.

# NOTE: this script is designed to work in a directory one level
# below that containing the mdcrd files from your REX run e.g.
# if the REX was done in /home/csw34/REX then this script should
# reside in /home/csw34/REX/clustering for example

# WARNING! The output files from AMBER REX are NOT ok to use directly
# They lack the blank line at the top which you see in normal AMBER
# trajectories and so cannot be safely read into ptraj, vmd or well
# anything. I suggest you add a line using for example this perl:
# perl -pi -e 'print "\n" if $. == 1' mdcrd.rep1 

# Echo the awk script used later to evaluate the centroids. Credit
# goes to Ross Walker for this one.
echo 'BEGIN{b0=2;}' > analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '{centind=index($1,"#Centroid");' >> analyse_centroid.awk
echo '   c=$2;' >> analyse_centroid.awk
echo '   getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
echo '   FIL0 = sprintf("centroid%2.2d.member.dat",c)' >> analyse_centroid.awk
echo '   while(centind != 1){' >> analyse_centroid.awk
echo '     print $1,$3 > FIL0 ;' >> analyse_centroid.awk
echo '     getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
echo '   }' >> analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '   numcent=0;    print $2,$1,NR-b0; ' >> analyse_centroid.awk
echo '   c=$2;' >> analyse_centroid.awk
echo '   getline; endrec = index("End",$0);' >> analyse_centroid.awk
echo '   while( endrec != 1 ){' >> analyse_centroid.awk
echo '     FIL = sprintf("centroid%2.2d.pdb",c);' >> analyse_centroid.awk
echo '     print > FIL;' >> analyse_centroid.awk
echo '     getline; endrec = index($0,"#End");' >> analyse_centroid.awk
echo '   }' >> analyse_centroid.awk
echo '   b0=NR+2;' >> analyse_centroid.awk
echo ' } ' >> analyse_centroid.awk

# Loop over each replica 
# **EDIT TO CHANGE REPLICA NUMBERS**
for i in 1 2 3 4 5 6 7 8
do
  mkdir rep$i
  mkdir rep$i/pdbs
# Echo the ptraj input file for the replica. Notice that I'm
# reading in the mdcrd file from frame 24540. This is to allow
# the replica to equilibrate before we start looking at structures
# and also - kclust can only deal with up to 49999 structures :(
# **EDIT TO CHANGE FILE NAMES OR STRUCTURE SAMPLING WINDOW**
  echo 'trajin ../mdcrd.rep'$i' 24540 74530' > extractrep$i.trajin
  echo 'trajout rep'$i'/pdbs/rep'$i'.pdb pdb'>> extractrep$i.trajin
  echo 'reference ../1LE1.inpcrd' >> extractrep$i.trajin
  echo 'rms reference mass :1-12@CA' >> extractrep$i.trajin
# Run ptraj to produce the output PDBs
  echo 'Extracting PDBs for replica '$i
# **EDIT TO CHANGE TOPOLOGY FILE NAME OR LOCATION**
  ptraj ../1LE1_ff99SB.prmtop < extractrep$i.trajin
# Produce a file containing all the PDB names for clustering
  ls -1 rep$i/pdbs > rep$i/clustfils
# Perform the clustering analysis using the MMTSB toolset kclust script
# **CHECK THE OPTIONS OF KCLUST SO YOU KNOW WHAT IT IS DOING**
  cd rep$i/pdbs
  echo 'Clustering structures using kclust...'
  kclust -mode rmsd -centroid -cdist -heavy -lsqfit -radius 6 -maxerr 1 -iterate ../clustfils > ../centroid_rep$i
# Extract info on each cluster centroid using awk script
  cd ..
  awk -f ../analyse_centroid.awk centroid_rep$i | tee centroid_rep$i.stats
# How many clusters are there?
  max=`wc centroid_rep$i.stats | awk '{print $1}'`
# Now we want to extract the structure closest to the centroid of each
  echo 'Extracting PDB closest to each centroid'
  for ((n=1; n <= $max ; n++))
  do
# Loop to get around a file naming convention thing
    if [[ "$n" -lt "10" ]]
      then closest=`sort -n -k2 centroid0$n.member.dat | head -1 | awk '{print $1}'`
    else
      closest=`sort -n -k2 centroid$n.member.dat | head -1 | awk '{print $1}'`
    fi
# As we didn't start at the first structure, this must be incremented
# by the appropriate offset. In this case 24539 (24540-1)
# **EDIT IN ACCORDANCE WITH THE START OF THE SAMPLING WINDOW**
    closest=$(($closest + 24539))
# Now, extract the PDB
    cp pdbs/rep$i.pdb.$closest ./bestmember_centroid$n.pdb
  done
# Remove the thousands of PDBs that are taking up disk space
  rm -r pdbs
# Return to the correct directory to process the next replica
  cd ..
  echo 'Done with replica '$i
  echo ''
done
#Cleaning up
rm *.trajin *.awk