Finding an initial path with OPTIM and starting up PATHSAMPLE (CHARMM)

From Docswiki
Jump to navigation Jump to search

Now that we have two interesting end point structures from running GMIN, initialmin (the extended starting structure after minimisation) and dbase.1 (the global minimum) we can make an initial discrete path (min1--TS1--min2--TS2--min3...minN) between them using OPTIM.

Setting up OPTIM

As before, make a new directory to work in, and copy the files that we're going to use:

mkdir COPTIM

We want to generate a path starting from the extended structure (initialmin), so we copy it to input.crd:

cp CGMIN/initialmin COPTIM/input.crd

We are trying to connect the extended structure to the lower energy global minimum (dbase.1), so we copy it to finish:

cp CGMIN/dbase.1 COPTIM/finish

When searching for minima and transition states between out endpoints, we are going to interpolate between structures. It is important that we take permutational isomerisation into account to prevent duplicate minima and transition states being identified. To do this, we use the perm.allow file we made in the first part of this tutorial:

cp perm.allow COPTIM

Finally, we need to create a new file, odata, the OPTIM input file. Open a new file using vi:

vi odata

and paste this in:

UPDATES 6000
NEWCONNECT 15 3 2.0 20.0 30 0.5
NEBK 10.0
DIJKSTRA EXP
DUMPALLPATHS
REOPTIMISEENDPOINTS
EDIFFTOL  1.0D-4
MAXERISE 1.0D-4 1.0D0
GEOMDIFFTOL  0.05D0
BFGSTS 500 10 100 0.01 100
NOIT
BFGSMIN 1.0D-6
PERMDIST
MAXSTEP  0.1
TRAD     0.2
MAXMAX   0.3
BFGSCONV 1.0D-6
PUSHOFF 0.1
STEPS 800
BFGSSTEPS 2000
MAXBFGS 0.1
CHARMMTYPE toph19_eef1_perm.inp param19_eef1_perm.inp
CHARMM
! Everything below the CHARMM line above is part of a CHARMM input file
set pardir "/home/csw34/svn/CHARMM31/toppar"

! BOMLev sets the level of warnings what do not cause the program to exit. -5 = very lax
BOMLev -5

! PRNLEV sets the ammount of output you get from CHARMM. 0 = small
PRNLEV 0

! Read standard topology and parameter files. These paths will need setting!
OPEN READ CARD UNIT 1 NAME @pardir/@top
READ RTF CARD UNIT 1
CLOSE UNIT 1

OPEN READ CARD UNIT 2 NAME @pardir/@par
READ PARAMETER CARD UNIT 2
CLOSE UNIT 2

! Generate the PSF for met-enk
READ SEQUence CARD
*
5
TYR GLY GLY PHE MET
GENErate FIRS NTER LAST CTER SETUp

OPEN UNIT 20 NAME input.crd READ CARD
READ COOR UNIT 20 CARD FREE
CLOSE UNIT 20

! Build the internal coordinate tables
IC FILL PRESERVE
IC PARAMETERS
IC PURGE
IC BUILD

! Set up the EEF1 solvent model
eef1 setup temp 298.15 unit 93 name "/home/csw34/svn/CHARMM31/toppar/solvpar.inp"
update ctonnb 7. ctofnb 9. cutnb 15. group rdie

As in GMIN, this file is made of two parts. Everything above the CHARMM line is OPTIM input. The full list of keywords can be found in the OPTIM documentation here, and it is again highly recommended that you take the time to find out what each of the above keywords is doing! The most important of these options specify the following:

  • BFGSMIN/BFGSCONV: the RMS force must be less than 10^-6 for a structure to be considered a stationary point.
  • BFGSSTEPS: up to 2000 BFGS steps will be done to minimise the structure, with a maximum step size of 0.1 coordinate units.
  • REOPTIMISEENDPOINTS: If the given initial structures are not already minima, according to the above RMS force criterion, then they will be locally minimized before being connected.
  • DUMPALLPATHS: all min--TS--min triples found during the connection run will be written to the path.info file. Important so that we are not throwing away possibly useful information.
  • DIJKSTRA EXP: the Dijkstra shortest-path algorithm with an exponential edge weight function will be used to select pairs of minima to connect in the iterative connection-making algorithm based on the doubly-nudged elastic band (DNEB) method.
  • NEWCONNECT: we're doing a connection making run. In order, the options mean 15 connection-making cycles/iterations (search for CONNECT CYCLE in the output file...); maximum 3 attempts to connect the same pair of minima (incrementing other DNEB parameters each time) before giving up on that pair of minima; a DNEB image density of 2.0 per unit distance along the vector separating the end points; an iteration density in the DNEB algorithm of 20.0 per image; maximum of 30 images in the DNEB; increment the image density by 0.5 when retrying the same pair of minima.

N.B. NOIT means we're doing an explicit diagonalization of the analytical Hessian to find the eigenvectors/values. This may not be appropriate for your system! Other possibilities include removing NOIT to use the iterative approach, or replacing NOIT with NOHESS (possibly in combination with ENDNUMHESS) if an analytical Hessian doesn't exist or is very large and time-consuming to work with.

As for GMIN, everything below the CHARMM line is CHARMM input, and is actually identical to in the GMIN data file. If your end-point coordinates have come from a previous GMIN or OPTIM run, then having the FREE on the READ COOR UNIT 20 CARD FREE coordinate-reading line is crucial; make sure this specification of free-format input is present here!

Compiling COPTIM

You should have already compiled COPTIM, and if you have problems doing this - please ask someone for help! In brief, assuming your OPTIM source is in ~/svn/OPTIM/source, and that you have a Portland (pgf90) compiler available, you should be able to compile COPTIM like this:

cd ~/svn/OPTIM/source
./build.csh cleanall
./build.csh coptim pgi

This might take some time so go grab some water and gaze out of a window :) Hopefully when it's done, you should see something like this:

build.csh> OPTIM executable = /home/csw34/svn/OPTIM/source/../bin/pgi/COPTIM
build.csh> Success. See make.out for detailed compiler output

If you have any error messages, ask for help! If you are working in CUC3 in Cambridge, you might want to copy the COPTIM executable you just compiled into your ~/bin directory, so you can use it without have to specify the whole svn path each time!

cp ~/svn/OPTIM/bin/pgi/COPTIM ~/bin

Running COPTIM

It's time to find a path connecting our two endpoints! Make sure you're in the COPTIM directory we created earlier, and run:

COPTIM > optim.out &

This should only take a minute or two to run. If you want to follow the progress of the run, you can tail the output file like this:

tail -f optim.out

If OPTIM has successfully found a connected (no gaps!) path between input.crd (initialmin) and finish (dbase.1), you should see this at the bottom of optim.out:

 Connected path found
  ts        E+         Ets - E+          Ets       Ets - E-          E-          S       D      gamma   ~N
  12     -143.3738462  2.4342        -140.9396544  2.4819        -143.4215907   4.722   4.146  16.703   2.874
   2     -143.4215907 0.68847        -142.7331237  1.3542        -144.0873508  17.429  15.830   2.418  19.850
  15     -144.0873508 0.24985E-01    -144.0623662  7.1963        -151.2586207  19.564  18.091   2.028  23.670
   9     -151.2586207  3.1371        -148.1215578  3.0673        -151.1888829   6.345   5.664  16.986   2.826
  10     -151.1888829  6.1103        -145.0786246  4.6974        -149.7759874  11.836  10.646   5.934   8.089
   7     -149.7759874  5.4075        -144.3684448  5.3809        -149.7493556  18.703   3.769  40.199   1.194
  14     -149.7493556  2.8122        -146.9371965  1.2215        -148.1587388   5.332   4.666   3.128  15.343
   8     -148.1587388  4.1491        -144.0096582  5.4996        -149.5092547  24.935  17.263   1.969  24.372
  18     -149.5092547 0.11279E-01    -149.4979756  3.8322        -153.3301944   6.223   5.891  16.157   2.971
  13     -153.3301944  2.2886        -151.0416195  4.4068        -155.4484180   7.009   6.487   5.821   8.246
   5     -155.4484180  2.3406        -153.1078217  3.2039        -156.3117667   5.392   4.540  20.487   2.343
   6     -156.3117667  1.8628        -154.4489782  2.6334        -157.0823478   4.471   4.005  15.074   3.184

 Number of TS in the path       =     12
 Number of cycles               =      4

 Elapsed time=                                30.12
 OPTIM> # of energy calls=                          0 time=           0.00 %=  0.0
 OPTIM> # of energy+gradient calls=            106436 time=          22.20 %= 73.7
 OPTIM> # of energy+gradient+Hessian calls=       609 time=           0.59 %=  2.0

If so, congratulations! You have just created your first pathway. If the run failed to find a connected path, there could be a number of reasons for this: the single-ended TS searches failed to converge, there was trouble minimising, the number of connect cycles was not enough considering the initial path length/expected "complexity" of paths, or the DNEB parameters were not good. Talk to someone with expertise to diagnose and fix the problem.

The section of the optim.out file above summarises the pathway that has been found. Each line represents a transition state along the pathway, with energy Ets. The energies of the two minima each transition state connects are also shown as E+ and E-, along with the associated barriers Ets - E+ and Ets - E-.

If you check the energies, you will notice that E+ for the first transition state is very similar to the energy of the initial structure from GMIN, before any basin-hopping was done - from GMIN_out:

Calculating initial energy
Qu          0 E=     -143.3738388398 steps=   22 RMS= 0.94537E-03 Markov E=    -143.3738388     t=        0.2

This is because it is this structure we chose as the starting point for the pathway. It is not exactly the same, as we have specified a tighter convergence of the RMS force in OPTIM than we did in for SLOPPYCONV in GMIN, and so the REOPTIMISEENDPOINTS keyword in our odata file has re-minimised the extended structure.

In the same way, you can see that E- for the final transition state has exactly the same energy as the global minimum we found in GMIN, and specified as the second endpoint for the pathway. From GMIN_out:

Acceptance ratio for run=     0.57000 Step=    88.20000 Angular step factor=     0.00000 T=     0.80000
Final Quench      1 energy=     -157.0823478416 steps=  170 RMS force=      0.0000073 time=       19.16

Visualising the pathway

One of the most interesting parts of generating pathways between minima, is visualising them. We can do this in two ways, by plotting the energy along the path using gnuplot, and by using VMD to look at the actual minima and transition states the comprise the path.

gnuplot

The energies of the minima and transition states are stored in the second column of the output file EofS. The first column is a measure of the length along the pathway at which that structure occurs. When plotting this file therefore, we can view the path in two ways:

1. Energy as a function of distance along the path

gnuplot
pl 'EofS' w l

2. Energy alone

gnuplot
pl 'EofS' u 2 w l

VMD

The coordinates of the minima and transition states along the pathway are stored in the path.xyz output file, in XYZ format.

vmd -xyz path.xyz

This will load the pathway into VMD as a sequence of minima and transition states. Try moving the slider to follow the path! If you're working on a CUC3 cluster, you might need to load the VMD module before this will work:

module load vmd

NOTE: because the file is in XYZ format, you cannot make selections based on atom or residue names, as VMD does not have that information!

If you want to compare your output to mine, you can find a compressed archive of all the files from the COPTIM directory here.

Setting up PATHSAMPLE

Now you know that you can make connected paths, you can use the above OPTIM job to seed a PATHSAMPLE kinetic transition network (or database).

path.info is the key OPTIM output file needed as input for PATHSAMPLE. It contains

  • the energy
  • the order of the molecular point group and its symbol
  • 3N Hessian eigenvalues, with N=number of atoms. Note the 6 zero eigenvalues, which may not seem very close to zero but just need to be orders-of-magnitude smaller than the 3N-6 non-zero ones.
  • 3N coordinates

for each structure in a min-TS-min triple, for all triples (min1-TS-min2 etc) found during the OPTIM run, not just those that are part of the connected pathway.

Although we could run PATHSAMPLE happily in the COPTIM directory we are in, for neatness - lets make a new PATHSAMPLE directory:

mkdir ../PATHSAMPLE
cd ../PATHSAMPLE

To set up PATHSAMPLE from the OPTIM output we just ran, we need to copy a few files.

  • path.info from the OPTIM connect job we just ran:
cp ../COPTIM/path.info .
  • perm.allow
cp ../COPTIM/perm.allow .
  • input.crd from the COPTIM run
cp ../COPTIM/input.crd .

We also need to create pathdata, the PATHSAMPLE input file. Here is the one we're going to use:

STARTFROMPATH path.info 0 0
NATOMS         48
ETOL           1.0D-4
ITOL 1.0D1
DIRECTION AB
GEOMDIFFTOL 1.0D-1
COPYFILES perm.allow input.crd
PLANCK 9.536D-14
TEMPERATURE   0.592
CYCLES 0
DIJKSTRA 0
PERMDIST
CHARMM

As always, it is highly recommended that you take a look at the PATHSAMPLE documentation here and try to understand what each of the keywords are going to do. For now though, you can ignore DIJKSTRA 0 as it won't be used!

We are now ready to use the path.info file from COPTIM to create an initial PATHSAMPLE database. To do this, you need to compile PATHSAMPLE. This is really simple. Assuming you have your PATHSAMPLE source in ~/svn/PATHSAMPLE/source:

cd ~/svn/PATHSAMPLE/source
make clean
make

If this doesn't work, please ask for help! You should now have a PATHSAMPLE binary in ~/svn/PATHSAMPLE/bin which you might want to copy to ~/bin:

cp ~/svn/PATHSAMPLE/bin/PATHSAMPLE ~/bin

Now, you can run PATHSAMPLE like this:

PATHSAMPLE > output.setup

This should only take a second to run, and your output.setup file should end like this:

getallpaths> writing data for new ts to ts.data
getallpaths> writing data for      1 new minima to min.data
getallpaths> writing data for new ts to ts.data
getallpaths> writing data for new ts to ts.data
getallpaths> writing data for new ts to ts.data
setup> The unique A and B minima are     0 and     0 respectively

You will notice that a lot of new files have been created in the directory. The important ones to note are:

  • min.data - the database of minima. Each line corresponds to one minimum i.e. the first line is minmum 1. This file contains the energy, the log product of Hessian eigenvalues, and the three principle moments of inertia for each minimum.
  • ts.data - the database of transition states. Very similar to min.data but for transition states. Each line also shows which minima the transition state connects in the 4th and 5th columns, for example, for transition state 1:
    -141.7612296514     8815.1845668576         1         1         2      282.3763020963     1913.4378311886     2065.6351729082

you can see that it connects minimum 1 to minimum 2 (lines 1 and 2 of the min.data file).

  • points.min - binary file containing the coordinates for each minimum.
  • points.ts - binary file containing the coordinates for each transition state.

Getting an initial rate for your pathway

There are two other important files, min.A and min.B which we need to modify. These contain the minima we consider to be reactants (A) and products (B), as defined by the DIRECTION AB line in pathdata. If you look, these are both currently set to:

     1
     0

This means we haven't actually defined a reactant or product group, so we can't work out rates as PATHSAMPLE won't know where to look for paths. For this tutorial, we want min.A to contain the minimum number corresponding to the extended structure (initialmin), and min.B to contain the minimum number corresponding to the global minimum. To find out what their numbers are, we need to compare the energies from our COPTIM run, with those in the min.data file we just made, and make a note of the line number.

From COPTIM/optim.out, we know the initial energy is -143.3738462 (E+ for the first transition state). In my min.data file, this corresponds to line 17 (yours may be different!):

     -143.3738461873     8872.8473781541     1      288.7833895541     1820.2221658643     1981.4192575837

So - we change the '0' in min.A, to 17:

     1
     17

Note that the '1' here just tells PATHSAMPLE to expect a single minimum number.

Do the same for the E- energy of the last transition state in COPTIM/optim.out In my case it matches line 10 of min.data, and so my min.B file looks like this:

     1
     10

Ok, before we run PATHSAMPLE to calculate the rate, we need to make a change to the pathdata file. Replace this line:

STARTFROMPATH path.info 0 0

with this:

comment STARTFROMPATH path.info 0 0

This means PATHSAMPLE will ignore this keyword, and not try to setup the database again! It will then continue to read the rest of the keywords, including DIJKSTRA 0 which you should now look up here!

Run PATHSAMPLE as before:

PATHSAMPLE > output.dijkstra

Hopefully your output.dijkstra file should end in a similar way to this:

Dijkstra> detailed balance, ratio should be one if SS applies:     0.5403657521E-01
Dijkstra> MFPT for fastest path: A<-B value=    0.3170455739E-04 B<-A value=    0.1723492293E-05
Dijkstra> rates without conditional probability: A<-B value=     31541.20676     B<-A value=     580217.2740    
Dijkstra> rates with    conditional probability: A<-B value=     31541.20676     B<-A value=     580217.2740    
main> CPU time spent in Dijkstra                    =    0.15182E-01 s
main> total CPU time spent in PATHSAMPLE            =    0.30354E-01 s

The exact numbers you get may be different if your OPTIM run found a different connected path! And there you have it! The rate for 'folding' (initialmin -> dbase.1) is B<-A (B from A)

B<-A value=     580217.2740

which you can report as 6x10^5. The rate for the reverse, 'unfolding' is

A<-B value=     31541.20676

or 3x10^4.

You should be aware that the Dijkstra analysis only considers the 'fastest' pathway, the one which makes the largest contribution to the steady-state rate constant between A and B - although the rates we just looked at do include re-crossings along that pathway. If you wanted to include multiple pathways, and off pathway minima/traps, you should look at the NGT keyword.

Also - a word of warning. Just because we found a connected pathway, does not mean that this is the fastest pathway that exists between A and B. It is very likely that alternative pathways exist with lower barriers and/or fewer steps. Finding these pathways is a whole different topic, and will be covered by a different tutorial!

For now, congratulations! You have successfully run GMIN to identify the global minimum for a small peptide, used OPTIM to identify a discrete path between a high-energy extended/unfolded structure and the global minimum, and then used PATHSAMPLE to determine the rate for that folding process :)

If you are having problems, you can download a compressed archive of all the files in the PATHSAMPLE directory here, and compare what you have with them!