Difference between revisions of "REMD with AMBER"

From CUC3
Jump to navigation Jump to search
import>Csw34
import>Csw34
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
===Setting up AMBER REX===
  +
 
To run REMD with AMBER you need '''sander.MPI'''. You can either copy it
 
To run REMD with AMBER you need '''sander.MPI'''. You can either copy it
 
(clust:~bs360/bin/sander.MPI) or create your own. To do so you have to copy the directory
 
(clust:~bs360/bin/sander.MPI) or create your own. To do so you have to copy the directory
Line 6: Line 8:
 
the path names accordingly).
 
the path names accordingly).
 
After producing the config.h file you can compile the MPI version by '''make parallel'''.
 
After producing the config.h file you can compile the MPI version by '''make parallel'''.
You should have loaded the pgi64 and mpi/lam/64/pgi/7.1.1 modules.
+
You should have loaded the pgi64 and mpi/lam/64/pgi/7.1.1 modules. This version has some modifications made to the Sander source code in the files mdfill.f and multisander.f. They are tagged with 'bs360' to make them easy to find. It is not known if they are all necessary, but it works fine with them!
   
 
To run REMD with AMBER you need following '''input files''':
 
To run REMD with AMBER you need following '''input files''':
Line 44: Line 46:
 
'''ntwr''' determines after now many MD steps the restart files mdinfo.rep* is updated.
 
'''ntwr''' determines after now many MD steps the restart files mdinfo.rep* is updated.
 
The '''temperatures''' for each replica you have to determine yourself, i.e, they are not automatically
 
The '''temperatures''' for each replica you have to determine yourself, i.e, they are not automatically
exponentially distributed between T_min and T_max as in the MMTSB protocol. Note that with '''irest=1''', each MD cycle will restart the trajectory from the relevant restart file. This is why you can have '''tempi=0''', as AMBER reads the temperature from that file and ignores the MD input.
+
exponentially distributed between <math>T_{min}</math> and <math>T_{max}</math> as in the MMTSB protocol. Note that with '''irest=1''', each MD cycle will restart the trajectory from the relevant restart file. This is why you can have '''tempi=0''', as AMBER reads the temperature from that file and ignores the MD input.
  +
  +
===Determining the replica temperatures===
  +
As we're not using the MMTSB toolset, you will need to calculate the replica temperatures yourself. This is done as follows with <math>n_{tot}=</math>TOTAL number of replicas, <math>n_{rep}=</math>(current replica number - 1), <math>T_{min}=</math>the lowest temperature used and <math>T_{max}=</math>the highest temperature used.
  +
  +
<math>A</math> is an constant defined as:
  +
<math>A=\frac{ln(T_{max}/T_{min})}{n_{tot}-1}</math>
  +
  +
The temperature of replica <math>n+1</math> is then defined as:
  +
  +
<math>T_{n+1}=T_{min}\exp{(An)}</math>
   
 
===The input.crd files===
 
===The input.crd files===
Line 65: Line 77:
 
</pre>
 
</pre>
 
Comparing restrt to the original inpcrd file, you'll notice that it is almost twice as long as before! This is because the velocities for each atom are now appended to the bottom of the file. Simply copy them into your inpcrd file and you're good to start the simulation.
 
Comparing restrt to the original inpcrd file, you'll notice that it is almost twice as long as before! This is because the velocities for each atom are now appended to the bottom of the file. Simply copy them into your inpcrd file and you're good to start the simulation.
 
Comparing the restart file produced with
 
   
 
You can find an example for REMD with AMBER in
 
You can find an example for REMD with AMBER in
 
clust:/sharedscratch/bs360/amberrex_test/
 
clust:/sharedscratch/bs360/amberrex_test/
  +
  +
===Analysing the results===
  +
  +
[[AMBER]] provides a multitude of tools to analyse MD trajectories, all of which are well documented in the AMBER manual under '''ptraj'''. However, before you can use ptraj to analyse any of the output mdcrd.rep* files, you need to make a slight modification. [[AMBER]] differentiates between replica exchange and CTMD output by NOT putting in a blank line at the start of REX trajectory files. This means that if you try to load them into '''ptraj''' for analysis, you will in fact get complete garbage out. Thankfully, this is easily solved by the addition of a blank line using, for example, the following script:
  +
  +
<pre>
  +
#!/bin/bash
  +
# Script to add a blank line at the top of each file. This is important as when doing REX
  +
# AMBER uses a different output format (without the blank line) for trajectories.
  +
# Sadly this means you can't use ptraj, vmd, or anything to read them in until you add it.
  +
for i in 1 2 3 4 5 6 7 8
  +
do
  +
perl -pi -e 'print "\n" if $. == 1' mdcrd.rep$i
  +
done
  +
</pre>
  +
  +
Note that if you have large files, this may take a while - but it is less boring than opening each in vi and adding a line at the top! Once you have done that, you can use '''ptraj''' to your hearts content. For example, here is a simple script that takes the trajectory files generated during the REX run, and produces input for a WHAM analysis - where the order parameters are the C-alpha RMSD from the crystal structure (1LE1.inpcrd) and the radius of gyration. Note that the trajectory is read in from exchange 11+1(offset)=12 to allow each replica to have equilibrated.
  +
  +
<pre>
  +
#!/bin/bash
  +
# Script to extract info from an AMBER9 REX run.
  +
# You need to edit the numbers and file names
  +
# when using more replicas or different systems.
  +
  +
# Remove any files from previous, abortive attempts
  +
rm ener.[1-8] rmsd_full.[1-8] rgyr_full.[1-8] rgyr.[1-8] rmsd.[1-8]
  +
rm rmsd.data rgyr.data energies.data
  +
# Loop over each replica
  +
for i in 1 2 3 4 5 6 7 8
  +
do
  +
# Echo the ptraj input file for the replica. Notice that I'm
  +
# reading in the mdcrd file from frame 11. This is to allow
  +
# the replica to equilibrate before we start to take readings.
  +
echo 'trajin mdcrd.rep'$i 11 99910 > rep$i.trajin
  +
echo 'reference 1LE1.inpcrd' >> rep$i.trajin
  +
echo 'radgyr out rgyr_full.'$i' time 1.5 @C,N,CA,O' >> rep$i.trajin
  +
echo 'rms reference out rmsd_full.'$i' time 1.5 @CA' >> rep$i.trajin
  +
# Run ptraj to produce the output
  +
ptraj 1LE1_ff96.prmtop < rep$i.trajin
  +
# Remove the time column and and fluff at the top of the rgyr
  +
awk '{print $2}' rmsd_full.$i > rmsd.$i
  +
awk '{print $2}' rgyr_full.$i | sed -n '2,$p' > rgyr.$i
  +
# Extract the energy from the mdinfo files
  +
grep "EPtot" mdinfo.rep$i |awk '{print $9}' | sed -n '12,$p' >>ener.$i
  +
done
  +
# Paste the data into columns to be read by WHAM
  +
paste ener.1 ener.2 ener.3 ener.4 ener.5 ener.6 ener.7 ener.8 > energies.data
  +
paste rgyr.1 rgyr.2 rgyr.3 rgyr.4 rgyr.5 rgyr.6 rgyr.7 rgyr.8 > rgyr.data
  +
paste rmsd.1 rmsd.2 rmsd.3 rmsd.4 rmsd.5 rmsd.6 rmsd.7 rmsd.8 > rmsd.data
  +
# Cleaning up the intermediate files
  +
rm rmsd_full.[1-8] rgyr_full.[1-8] rep[1-8].trajin
  +
rm ener.[1-8] rgyr.[1-8] rmsd.[1-8]
  +
</pre>
  +
  +
===Clustering structures===
  +
  +
If you also have the MMTSB Toolset installed - you can cluster the output of your REX runs quite easily. Clustering provides information on the most populated conformational groups, and makes it easy to pick out common structures from the run without manually trawling through every structure one at a time. Here is an example script that you can use to perform such a clustering separately for each replica in a run. You will need to modify it before you use it due to differing file names and windows over which you are sampling as the MMTSB algorithm can only cope with <50000 structures.
  +
  +
<pre>
  +
#!/bin/bash
  +
# Script to extract PDBs from an AMBER9 REX run and perform
  +
# a simple clustering analysis for each replica
  +
# You need to edit the numbers and file names
  +
# when using more replicas or different systems.
  +
# The clustering uses the MMTSB script kclust and as a result,
  +
# MMTSB must be installed and working.
  +
  +
# NOTE: this script is designed to work in a directory one level
  +
# below that containing the mdcrd files from your REX run e.g.
  +
# if the REX was done in /home/csw34/REX then this script should
  +
# reside in /home/csw34/REX/clustering for example
  +
  +
# WARNING! The output files from AMBER REX are NOT ok to use directly
  +
# They lack the blank line at the top which you see in normal AMBER
  +
# trajectories and so cannot be safely read into ptraj, vmd or well
  +
# anything. I suggest you add a line using for example this perl:
  +
# perl -pi -e 'print "\n" if $. == 1' mdcrd.rep1
  +
  +
# Echo the awk script used later to evaluate the centroids. Credit
  +
# goes to Ross Walker for this one.
  +
echo 'BEGIN{b0=2;}' > analyse_centroid.awk
  +
echo '' >> analyse_centroid.awk
  +
echo '{centind=index($1,"#Centroid");' >> analyse_centroid.awk
  +
echo ' c=$2;' >> analyse_centroid.awk
  +
echo ' getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
  +
echo ' FIL0 = sprintf("centroid%2.2d.member.dat",c)' >> analyse_centroid.awk
  +
echo ' while(centind != 1){' >> analyse_centroid.awk
  +
echo ' print $1,$3 > FIL0 ;' >> analyse_centroid.awk
  +
echo ' getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
  +
echo ' }' >> analyse_centroid.awk
  +
echo '' >> analyse_centroid.awk
  +
echo '' >> analyse_centroid.awk
  +
echo ' numcent=0; print $2,$1,NR-b0; ' >> analyse_centroid.awk
  +
echo ' c=$2;' >> analyse_centroid.awk
  +
echo ' getline; endrec = index("End",$0);' >> analyse_centroid.awk
  +
echo ' while( endrec != 1 ){' >> analyse_centroid.awk
  +
echo ' FIL = sprintf("centroid%2.2d.pdb",c);' >> analyse_centroid.awk
  +
echo ' print > FIL;' >> analyse_centroid.awk
  +
echo ' getline; endrec = index($0,"#End");' >> analyse_centroid.awk
  +
echo ' }' >> analyse_centroid.awk
  +
echo ' b0=NR+2;' >> analyse_centroid.awk
  +
echo ' } ' >> analyse_centroid.awk
  +
  +
# Loop over each replica
  +
# **EDIT TO CHANGE REPLICA NUMBERS**
  +
for i in 1 2 3 4 5 6 7 8
  +
do
  +
mkdir rep$i
  +
mkdir rep$i/pdbs
  +
# Echo the ptraj input file for the replica. Notice that I'm
  +
# reading in the mdcrd file from frame 24540. This is to allow
  +
# the replica to equilibrate before we start looking at structures
  +
# and also - kclust can only deal with up to 49999 structures :(
  +
# **EDIT TO CHANGE FILE NAMES OR STRUCTURE SAMPLING WINDOW**
  +
echo 'trajin ../mdcrd.rep'$i' 24540 74530' > extractrep$i.trajin
  +
echo 'trajout rep'$i'/pdbs/rep'$i'.pdb pdb'>> extractrep$i.trajin
  +
echo 'reference ../1LE1.inpcrd' >> extractrep$i.trajin
  +
echo 'rms reference mass :1-12@CA' >> extractrep$i.trajin
  +
# Run ptraj to produce the output PDBs
  +
echo 'Extracting PDBs for replica '$i
  +
# **EDIT TO CHANGE TOPOLOGY FILE NAME OR LOCATION**
  +
ptraj ../1LE1_ff99SB.prmtop < extractrep$i.trajin
  +
# Produce a file containing all the PDB names for clustering
  +
ls -1 rep$i/pdbs > rep$i/clustfils
  +
# Perform the clustering analysis using the MMTSB toolset kclust script
  +
# **CHECK THE OPTIONS OF KCLUST SO YOU KNOW WHAT IT IS DOING**
  +
cd rep$i/pdbs
  +
echo 'Clustering structures using kclust...'
  +
kclust -mode rmsd -centroid -cdist -heavy -lsqfit -radius 6 -maxerr 1 -iterate ../clustfils > ../centroid_rep$i
  +
# Extract info on each cluster centroid using awk script
  +
cd ..
  +
awk -f ../analyse_centroid.awk centroid_rep$i | tee centroid_rep$i.stats
  +
# How many clusters are there?
  +
max=`wc centroid_rep$i.stats | awk '{print $1}'`
  +
# Now we want to extract the structure closest to the centroid of each
  +
echo 'Extracting PDB closest to each centroid'
  +
for ((n=1; n <= $max ; n++))
  +
do
  +
# Loop to get around a file naming convention thing
  +
if [[ "$n" -lt "10" ]]
  +
then closest=`sort -n -k2 centroid0$n.member.dat | head -1 | awk '{print $1}'`
  +
else
  +
closest=`sort -n -k2 centroid$n.member.dat | head -1 | awk '{print $1}'`
  +
fi
  +
# As we didn't start at the first structure, this must be incremented
  +
# by the appropriate offset. In this case 24539 (24540-1)
  +
# **EDIT IN ACCORDANCE WITH THE START OF THE SAMPLING WINDOW**
  +
closest=$(($closest + 24539))
  +
# Now, extract the PDB
  +
cp pdbs/rep$i.pdb.$closest ./bestmember_centroid$n.pdb
  +
done
  +
# Remove the thousands of PDBs that are taking up disk space
  +
rm -r pdbs
  +
# Return to the correct directory to process the next replica
  +
cd ..
  +
echo 'Done with replica '$i
  +
echo ''
  +
done
  +
#Cleaning up
  +
rm *.trajin *.awk
  +
</pre>

Latest revision as of 12:48, 24 October 2008

Setting up AMBER REX

To run REMD with AMBER you need sander.MPI. You can either copy it (clust:~bs360/bin/sander.MPI) or create your own. To do so you have to copy the directory clust:~bs360/MPIAMBER/ and follow the instructions on http://wwmm.ch.cam.ac.uk/wikis/ucc/index.php/Notes_on_dove#64-bit_AMBER9 to create the config.h (or take the existing one in the MPIAMBER/src/ directory and change the path names accordingly). After producing the config.h file you can compile the MPI version by make parallel. You should have loaded the pgi64 and mpi/lam/64/pgi/7.1.1 modules. This version has some modifications made to the Sander source code in the files mdfill.f and multisander.f. They are tagged with 'bs360' to make them easy to find. It is not known if they are all necessary, but it works fine with them!

To run REMD with AMBER you need following input files:

-> numreplicas

  (number of repilcas)

-> groupfile

   -A -i mdin.rep1 -o mdout.rep1 -c inpcrd.rep1 -p ./ala2.prmtop -r restrt.rep1 -x mdcrd.rep1 -inf mdinfo.rep1
   -A -i mdin.rep2 -o mdout.rep2 -c inpcrd.rep2 -p ./ala2.prmtop -r restrt.rep2 -x mdcrd.rep2 -inf mdinfo.rep2
   ...

The names of 'numreplicas' and 'groupfile' can not be changed! All other files you can name as you like.

mdin.rep*, inpcrd.rep* and ala2.prmtop define the other input files; all other files are for output.

The inpcrd.rep* and ala2.prmtop are the usual coordinate and topology files, and mdin.rep* define the input for the MD for each replica:

REMD run 1

&cntrl
       imin = 0, nstlim = 500, dt = 0.002,
       tempi = 0, temp0 = 100.0,
       ntt = 3, tol = 0.000001, gamma_ln = 5.0,
       ntc = 2, ntf = 1, ntb = 0, ntx = 5,
       ntwx = 500, ntwe = 0, ntwr =500, ntpr = 500,
       cut = 99.0, igb = 0, saltcon=0.0,
       nscm = 500, irest=1,
       ntave = 0, numexchg=100000
&end

An exchange is attempted after each cycle. The cycle lenth is determined by (ntslim)x(dt), i.e. (MD steps)x(step) length. ntwx determines after how many MD steps the coordinats are written to mdcrd.rep*. ntpr determines after how many MD steps the energy is written to mdinfo.rep*. ntwr determines after now many MD steps the restart files mdinfo.rep* is updated. The temperatures for each replica you have to determine yourself, i.e, they are not automatically exponentially distributed between <math>T_{min}</math> and <math>T_{max}</math> as in the MMTSB protocol. Note that with irest=1, each MD cycle will restart the trajectory from the relevant restart file. This is why you can have tempi=0, as AMBER reads the temperature from that file and ignores the MD input.

Determining the replica temperatures

As we're not using the MMTSB toolset, you will need to calculate the replica temperatures yourself. This is done as follows with <math>n_{tot}=</math>TOTAL number of replicas, <math>n_{rep}=</math>(current replica number - 1), <math>T_{min}=</math>the lowest temperature used and <math>T_{max}=</math>the highest temperature used.

<math>A</math> is an constant defined as: <math>A=\frac{ln(T_{max}/T_{min})}{n_{tot}-1}</math>

The temperature of replica <math>n+1</math> is then defined as:

<math>T_{n+1}=T_{min}\exp{(An)}</math>

The input.crd files

As you have irest=1 and ntx=5, you need to have velocity information in your inpcrd file to start the REMD simulation. This is done by first running a 1 step CTMD run with the original input, and then copying the velocities from the restart file created. Input for this is below:

 Velocity randomisation
 &cntrl
        imin = 0, nstlim = 1, dt = 0.0015,
        tempi = 0, temp0 = 250.0,
        ntt = 3, gamma_ln = 5.0, ntx = 1,
        ntc = 1, ntf = 1, ntb = 0,
        ntwx = 100, ntwe = 0, ntwr =100, ntpr = 100,
        cut = 99.0, igb = 5, saltcon=0.1,
 &end

In this case you want to run Sander with a command like this:

$AMBERHOME/exe/sander -O -i md.in -o md.out -c 1LE1.inpcrd -p ./1LE1_ff03.prmtop -r restrt -x mdcrd -inf mdinfo

Comparing restrt to the original inpcrd file, you'll notice that it is almost twice as long as before! This is because the velocities for each atom are now appended to the bottom of the file. Simply copy them into your inpcrd file and you're good to start the simulation.

You can find an example for REMD with AMBER in clust:/sharedscratch/bs360/amberrex_test/

Analysing the results

AMBER provides a multitude of tools to analyse MD trajectories, all of which are well documented in the AMBER manual under ptraj. However, before you can use ptraj to analyse any of the output mdcrd.rep* files, you need to make a slight modification. AMBER differentiates between replica exchange and CTMD output by NOT putting in a blank line at the start of REX trajectory files. This means that if you try to load them into ptraj for analysis, you will in fact get complete garbage out. Thankfully, this is easily solved by the addition of a blank line using, for example, the following script:

#!/bin/bash
# Script to add a blank line at the top of each file. This is important as when doing REX
# AMBER uses a different output format (without the blank line) for trajectories. 
# Sadly this means you can't use ptraj, vmd, or anything to read them in until you add it.
for i in 1 2 3 4 5 6 7 8
do
  perl -pi -e 'print "\n" if $. == 1' mdcrd.rep$i
done

Note that if you have large files, this may take a while - but it is less boring than opening each in vi and adding a line at the top! Once you have done that, you can use ptraj to your hearts content. For example, here is a simple script that takes the trajectory files generated during the REX run, and produces input for a WHAM analysis - where the order parameters are the C-alpha RMSD from the crystal structure (1LE1.inpcrd) and the radius of gyration. Note that the trajectory is read in from exchange 11+1(offset)=12 to allow each replica to have equilibrated.

#!/bin/bash
# Script to extract info from an AMBER9 REX run. 
# You need to edit the numbers and file names 
# when using more replicas or different systems.

# Remove any files from previous, abortive attempts
rm ener.[1-8] rmsd_full.[1-8] rgyr_full.[1-8] rgyr.[1-8] rmsd.[1-8]
rm rmsd.data rgyr.data energies.data
# Loop over each replica
for i in 1 2 3 4 5 6 7 8
do
# Echo the ptraj input file for the replica. Notice that I'm
# reading in the mdcrd file from frame 11. This is to allow
# the replica to equilibrate before we start to take readings.
  echo 'trajin mdcrd.rep'$i 11 99910 > rep$i.trajin
  echo 'reference 1LE1.inpcrd' >> rep$i.trajin
  echo 'radgyr out rgyr_full.'$i' time 1.5 @C,N,CA,O' >> rep$i.trajin
  echo 'rms reference out rmsd_full.'$i' time 1.5 @CA' >> rep$i.trajin
# Run ptraj to produce the output
  ptraj 1LE1_ff96.prmtop < rep$i.trajin
# Remove the time column and and fluff at the top of the rgyr
  awk '{print $2}' rmsd_full.$i > rmsd.$i
  awk '{print $2}' rgyr_full.$i  | sed -n '2,$p' > rgyr.$i
# Extract the energy from the mdinfo files
  grep "EPtot" mdinfo.rep$i |awk '{print $9}' | sed -n '12,$p' >>ener.$i
done
# Paste the data into columns to be read by WHAM
paste ener.1 ener.2 ener.3 ener.4 ener.5 ener.6 ener.7 ener.8  > energies.data
paste rgyr.1 rgyr.2 rgyr.3 rgyr.4 rgyr.5 rgyr.6 rgyr.7 rgyr.8 > rgyr.data
paste rmsd.1 rmsd.2 rmsd.3 rmsd.4 rmsd.5 rmsd.6 rmsd.7 rmsd.8 > rmsd.data
# Cleaning up the intermediate files
rm rmsd_full.[1-8] rgyr_full.[1-8] rep[1-8].trajin
rm ener.[1-8] rgyr.[1-8] rmsd.[1-8]

Clustering structures

If you also have the MMTSB Toolset installed - you can cluster the output of your REX runs quite easily. Clustering provides information on the most populated conformational groups, and makes it easy to pick out common structures from the run without manually trawling through every structure one at a time. Here is an example script that you can use to perform such a clustering separately for each replica in a run. You will need to modify it before you use it due to differing file names and windows over which you are sampling as the MMTSB algorithm can only cope with <50000 structures.

#!/bin/bash
# Script to extract PDBs from an AMBER9 REX run and perform
# a simple clustering analysis for each replica
# You need to edit the numbers and file names 
# when using more replicas or different systems.
# The clustering uses the MMTSB script kclust and as a result,
# MMTSB must be installed and working.

# NOTE: this script is designed to work in a directory one level
# below that containing the mdcrd files from your REX run e.g.
# if the REX was done in /home/csw34/REX then this script should
# reside in /home/csw34/REX/clustering for example

# WARNING! The output files from AMBER REX are NOT ok to use directly
# They lack the blank line at the top which you see in normal AMBER
# trajectories and so cannot be safely read into ptraj, vmd or well
# anything. I suggest you add a line using for example this perl:
# perl -pi -e 'print "\n" if $. == 1' mdcrd.rep1 

# Echo the awk script used later to evaluate the centroids. Credit
# goes to Ross Walker for this one.
echo 'BEGIN{b0=2;}' > analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '{centind=index($1,"#Centroid");' >> analyse_centroid.awk
echo '   c=$2;' >> analyse_centroid.awk
echo '   getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
echo '   FIL0 = sprintf("centroid%2.2d.member.dat",c)' >> analyse_centroid.awk
echo '   while(centind != 1){' >> analyse_centroid.awk
echo '     print $1,$3 > FIL0 ;' >> analyse_centroid.awk
echo '     getline;centind=index($0,"#Centroid");' >> analyse_centroid.awk
echo '   }' >> analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '' >> analyse_centroid.awk
echo '   numcent=0;    print $2,$1,NR-b0; ' >> analyse_centroid.awk
echo '   c=$2;' >> analyse_centroid.awk
echo '   getline; endrec = index("End",$0);' >> analyse_centroid.awk
echo '   while( endrec != 1 ){' >> analyse_centroid.awk
echo '     FIL = sprintf("centroid%2.2d.pdb",c);' >> analyse_centroid.awk
echo '     print > FIL;' >> analyse_centroid.awk
echo '     getline; endrec = index($0,"#End");' >> analyse_centroid.awk
echo '   }' >> analyse_centroid.awk
echo '   b0=NR+2;' >> analyse_centroid.awk
echo ' } ' >> analyse_centroid.awk

# Loop over each replica 
# **EDIT TO CHANGE REPLICA NUMBERS**
for i in 1 2 3 4 5 6 7 8
do
  mkdir rep$i
  mkdir rep$i/pdbs
# Echo the ptraj input file for the replica. Notice that I'm
# reading in the mdcrd file from frame 24540. This is to allow
# the replica to equilibrate before we start looking at structures
# and also - kclust can only deal with up to 49999 structures :(
# **EDIT TO CHANGE FILE NAMES OR STRUCTURE SAMPLING WINDOW**
  echo 'trajin ../mdcrd.rep'$i' 24540 74530' > extractrep$i.trajin
  echo 'trajout rep'$i'/pdbs/rep'$i'.pdb pdb'>> extractrep$i.trajin
  echo 'reference ../1LE1.inpcrd' >> extractrep$i.trajin
  echo 'rms reference mass :1-12@CA' >> extractrep$i.trajin
# Run ptraj to produce the output PDBs
  echo 'Extracting PDBs for replica '$i
# **EDIT TO CHANGE TOPOLOGY FILE NAME OR LOCATION**
  ptraj ../1LE1_ff99SB.prmtop < extractrep$i.trajin
# Produce a file containing all the PDB names for clustering
  ls -1 rep$i/pdbs > rep$i/clustfils
# Perform the clustering analysis using the MMTSB toolset kclust script
# **CHECK THE OPTIONS OF KCLUST SO YOU KNOW WHAT IT IS DOING**
  cd rep$i/pdbs
  echo 'Clustering structures using kclust...'
  kclust -mode rmsd -centroid -cdist -heavy -lsqfit -radius 6 -maxerr 1 -iterate ../clustfils > ../centroid_rep$i
# Extract info on each cluster centroid using awk script
  cd ..
  awk -f ../analyse_centroid.awk centroid_rep$i | tee centroid_rep$i.stats
# How many clusters are there?
  max=`wc centroid_rep$i.stats | awk '{print $1}'`
# Now we want to extract the structure closest to the centroid of each
  echo 'Extracting PDB closest to each centroid'
  for ((n=1; n <= $max ; n++))
  do
# Loop to get around a file naming convention thing
    if [[ "$n" -lt "10" ]]
      then closest=`sort -n -k2 centroid0$n.member.dat | head -1 | awk '{print $1}'`
    else
      closest=`sort -n -k2 centroid$n.member.dat | head -1 | awk '{print $1}'`
    fi
# As we didn't start at the first structure, this must be incremented
# by the appropriate offset. In this case 24539 (24540-1)
# **EDIT IN ACCORDANCE WITH THE START OF THE SAMPLING WINDOW**
    closest=$(($closest + 24539))
# Now, extract the PDB
    cp pdbs/rep$i.pdb.$closest ./bestmember_centroid$n.pdb
  done
# Remove the thousands of PDBs that are taking up disk space
  rm -r pdbs
# Return to the correct directory to process the next replica
  cd ..
  echo 'Done with replica '$i
  echo ''
done
#Cleaning up
rm *.trajin *.awk