Finding an initial path with OPTIM and starting up PATHSAMPLE (CHARMM)
Now that we have two interesting end point structures from running GMIN, initialmin (the extended starting structure after minimisation) and dbase.1 (the global minimum) we can make an initial discrete path (min1--TS1--min2--TS2--min3...minN) between them using OPTIM.
Setting up OPTIM
As before, make a new directory to work in, and copy the files that we're going to use:
mkdir COPTIM
We want to generate a path starting from the extended structure (initialmin), so we copy it to input.crd:
cp CGMIN/initialmin COPTIM/input.crd
We are trying to connect the extended structure to the lower energy global minimum (dbase.1), so we copy it to finish:
cp CGMIN/dbase.1 COPTIM/finish
When searching for minima and transition states between out endpoints, we are going to interpolate between structures. It is important that we take permutational isomerisation into account to prevent duplicate minima and transition states being identified. To do this, we use the perm.allow file we made in the first part of this tutorial:
cp perm.allow COPTIM
Finally, we need to create a new file, odata, the OPTIM input file. Open a new file using vi:
vi odata
and paste this in:
UPDATES 6000 NEWCONNECT 15 3 2.0 20.0 30 0.5 NEBK 10.0 DIJKSTRA EXP DUMPALLPATHS REOPTIMISEENDPOINTS EDIFFTOL 1.0D-4 MAXERISE 1.0D-4 1.0D0 GEOMDIFFTOL 0.05D0 BFGSTS 500 10 100 0.01 100 NOIT BFGSMIN 1.0D-6 PERMDIST MAXSTEP 0.1 TRAD 0.2 MAXMAX 0.3 BFGSCONV 1.0D-6 PUSHOFF 0.1 STEPS 800 BFGSSTEPS 2000 MAXBFGS 0.1 CHARMMTYPE toph19_eef1_perm.inp param19_eef1_perm.inp CHARMM ! Everything below the CHARMM line above is part of a CHARMM input file set pardir "/home/csw34/svn/CHARMM31/toppar" ! BOMLev sets the level of warnings what do not cause the program to exit. -5 = very lax BOMLev -5 ! PRNLEV sets the ammount of output you get from CHARMM. 0 = small PRNLEV 0 ! Read standard topology and parameter files. These paths will need setting! OPEN READ CARD UNIT 1 NAME @pardir/@top READ RTF CARD UNIT 1 CLOSE UNIT 1 OPEN READ CARD UNIT 2 NAME @pardir/@par READ PARAMETER CARD UNIT 2 CLOSE UNIT 2 ! Generate the PSF for met-enk READ SEQUence CARD * 5 TYR GLY GLY PHE MET GENErate FIRS NTER LAST CTER SETUp OPEN UNIT 20 NAME input.crd READ CARD READ COOR UNIT 20 CARD FREE CLOSE UNIT 20 ! Build the internal coordinate tables IC FILL PRESERVE IC PARAMETERS IC PURGE IC BUILD ! Set up the EEF1 solvent model eef1 setup temp 298.15 unit 93 name "/home/csw34/svn/CHARMM31/toppar/solvpar.inp" update ctonnb 7. ctofnb 9. cutnb 15. group rdie
As in GMIN, this file is made of two parts. Everything above the CHARMM line is OPTIM input. The full list of keywords can be found in the OPTIM documentation here, and it is again highly recommended that you take the time to find out what each of the above keywords is doing! The most important of these options specify the following:
- BFGSMIN/BFGSCONV: the RMS force must be less than 10^-6 for a structure to be considered a stationary point.
- BFGSSTEPS: up to 2000 BFGS steps will be done to minimise the structure, with a maximum step size of 0.1 coordinate units.
- REOPTIMISEENDPOINTS: If the given initial structures are not already minima, according to the above RMS force criterion, then they will be locally minimized before being connected.
- DUMPALLPATHS: all min--TS--min triples found during the connection run will be written to the path.info file. Important so that we are not throwing away possibly useful information.
- DIJKSTRA EXP: the Dijkstra shortest-path algorithm with an exponential edge weight function will be used to select pairs of minima to connect in the iterative connection-making algorithm based on the doubly-nudged elastic band (DNEB) method.
- NEWCONNECT: we're doing a connection making run. In order, the options mean 15 connection-making cycles/iterations (search for CONNECT CYCLE in the output file...); maximum 3 attempts to connect the same pair of minima (incrementing other DNEB parameters each time) before giving up on that pair of minima; a DNEB image density of 2.0 per unit distance along the vector separating the end points; an iteration density in the DNEB algorithm of 20.0 per image; maximum of 30 images in the DNEB; increment the image density by 0.5 when retrying the same pair of minima.
N.B. NOIT means we're doing an explicit diagonalization of the analytical Hessian to find the eigenvectors/values. This may not be appropriate for your system! Other possibilities include removing NOIT to use the iterative approach, or replacing NOIT with NOHESS (possibly in combination with ENDNUMHESS) if an analytical Hessian doesn't exist or is very large and time-consuming to work with.
As for GMIN, everything below the CHARMM line is CHARMM input, and is actually identical to in the GMIN data file. If your end-point coordinates have come from a previous GMIN or OPTIM run, then having the FREE on the READ COOR UNIT 20 CARD FREE coordinate-reading line is crucial; make sure this specification of free-format input is present here!
Compiling COPTIM
You should have already compiled COPTIM, and if you have problems doing this - please ask someone for help! In brief, assuming your OPTIM source is in ~/svn/OPTIM/source, and that you have a Portland (pgf90) compiler available, you should be able to compile COPTIM like this:
cd ~/svn/OPTIM/source ./build.csh cleanall ./build.csh coptim pgi
This might take some time so go grab some water and gaze out of a window :) Hopefully when it's done, you should see something like this:
build.csh> OPTIM executable = /home/csw34/svn/OPTIM/source/../bin/pgi/COPTIM build.csh> Success. See make.out for detailed compiler output
If you have any error messages, ask for help! If you are working in CUC3 in Cambridge, you might want to copy the COPTIM executable you just compiled into your ~/bin directory, so you can use it without have to specify the whole svn path each time!
cp ~/svn/OPTIM/bin/pgi/COPTIM ~/bin
Running COPTIM
It's time to find a path connecting our two endpoints! Make sure you're in the COPTIM directory we created earlier, and run:
COPTIM > optim.out &
This should only take a minute or two to run. If you want to follow the progress of the run, you can tail the output file like this:
tail -f optim.out
If OPTIM has successfully found a connected (no gaps!) path between input.crd (initialmin) and finish (dbase.1), you should see this at the bottom of optim.out:
Connected path found ts E+ Ets - E+ Ets Ets - E- E- S D gamma ~N 12 -143.3738462 2.4342 -140.9396544 2.4819 -143.4215907 4.722 4.146 16.703 2.874 2 -143.4215907 0.68847 -142.7331237 1.3542 -144.0873508 17.429 15.830 2.418 19.850 15 -144.0873508 0.24985E-01 -144.0623662 7.1963 -151.2586207 19.564 18.091 2.028 23.670 9 -151.2586207 3.1371 -148.1215578 3.0673 -151.1888829 6.345 5.664 16.986 2.826 10 -151.1888829 6.1103 -145.0786246 4.6974 -149.7759874 11.836 10.646 5.934 8.089 7 -149.7759874 5.4075 -144.3684448 5.3809 -149.7493556 18.703 3.769 40.199 1.194 14 -149.7493556 2.8122 -146.9371965 1.2215 -148.1587388 5.332 4.666 3.128 15.343 8 -148.1587388 4.1491 -144.0096582 5.4996 -149.5092547 24.935 17.263 1.969 24.372 18 -149.5092547 0.11279E-01 -149.4979756 3.8322 -153.3301944 6.223 5.891 16.157 2.971 13 -153.3301944 2.2886 -151.0416195 4.4068 -155.4484180 7.009 6.487 5.821 8.246 5 -155.4484180 2.3406 -153.1078217 3.2039 -156.3117667 5.392 4.540 20.487 2.343 6 -156.3117667 1.8628 -154.4489782 2.6334 -157.0823478 4.471 4.005 15.074 3.184 Number of TS in the path = 12 Number of cycles = 4 Elapsed time= 30.12 OPTIM> # of energy calls= 0 time= 0.00 %= 0.0 OPTIM> # of energy+gradient calls= 106436 time= 22.20 %= 73.7 OPTIM> # of energy+gradient+Hessian calls= 609 time= 0.59 %= 2.0
If so, congratulations! You have just created your first pathway. If the run failed to find a connected path, there could be a number of reasons for this: the single-ended TS searches failed to converge, there was trouble minimising, the number of connect cycles was not enough considering the initial path length/expected "complexity" of paths, or the DNEB parameters were not good. Talk to someone with expertise to diagnose and fix the problem.
The section of the optim.out file above summarises the pathway that has been found. Each line represents a transition state along the pathway, with energy Ets. The energies of the two minima each transition state connects are also shown as E+ and E-, along with the associated barriers Ets - E+ and Ets - E-.
If you check the energies, you will notice that E+ for the first transition state is very similar to the energy of the initial structure from GMIN, before any basin-hopping was done - from GMIN_out:
Calculating initial energy Qu 0 E= -143.3738388398 steps= 22 RMS= 0.94537E-03 Markov E= -143.3738388 t= 0.2
This is because it is this structure we chose as the starting point for the pathway. It is not exactly the same, as we have specified a tighter convergence of the RMS force in OPTIM than we did in for SLOPPYCONV in GMIN, and so the REOPTIMISEENDPOINTS keyword in our odata file has re-minimised the extended structure.
In the same way, you can see that E- for the final transition state has exactly the same energy as the global minimum we found in GMIN, and specified as the second endpoint for the pathway. From GMIN_out:
Acceptance ratio for run= 0.57000 Step= 88.20000 Angular step factor= 0.00000 T= 0.80000 Final Quench 1 energy= -157.0823478416 steps= 170 RMS force= 0.0000073 time= 19.16
Visualising the pathway
One of the most interesting parts of generating pathways between minima, is visualising them. We can do this in two ways, by plotting the energy along the path using gnuplot, and by using VMD to look at the actual minima and transition states the comprise the path.
gnuplot
The energies of the minima and transition states are stored in the second column of the output file EofS. The first column is a measure of the length along the pathway at which that structure occurs. When plotting this file therefore, we can view the path in two ways:
1. Energy as a function of distance along the path
gnuplot pl 'EofS' w l
2. Energy alone
gnuplot pl 'EofS' u 2 w l
VMD
The coordinates of the minima and transition states along the pathway are stored in the path.xyz output file, in XYZ format.
vmd -xyz path.xyz
This will load the pathway into VMD as a sequence of minima and transition states. Try moving the slider to follow the path! If you're working on a CUC3 cluster, you might need to load the VMD module before this will work:
module load vmd
NOTE: because the file is in XYZ format, you cannot make selections based on atom or residue names, as VMD does not have that information!
If you want to compare your output to mine, you can find a compressed archive of all the files from the COPTIM directory here.
Setting up PATHSAMPLE
Now you know that you can make connected paths, you can use the above OPTIM job to seed a PATHSAMPLE kinetic transition network (or database).
path.info is the key OPTIM output file needed as input for PATHSAMPLE. It contains
- the energy
- the order of the molecular point group and its symbol
- 3N Hessian eigenvalues, with N=number of atoms. Note the 6 zero eigenvalues, which may not seem very close to zero but just need to be orders-of-magnitude smaller than the 3N-6 non-zero ones.
- 3N coordinates
for each structure in a min-TS-min triple, for all triples (min1-TS-min2 etc) found during the OPTIM run, not just those that are part of the connected pathway.
Although we could run PATHSAMPLE happily in the COPTIM directory we are in, for neatness - lets make a new PATHSAMPLE directory:
mkdir ../PATHSAMPLE cd ../PATHSAMPLE
To set up PATHSAMPLE from the OPTIM output we just ran, we need to copy a few files.
- path.info from the OPTIM connect job we just ran:
cp ../COPTIM/path.info .
- perm.allow
cp ../COPTIM/perm.allow .
- input.crd from the COPTIM run
cp ../COPTIM/input.crd .
We also need to create pathdata, the PATHSAMPLE input file. Here is the one we're going to use:
STARTFROMPATH path.info 0 0 NATOMS 48 ETOL 1.0D-4 ITOL 1.0D1 DIRECTION AB GEOMDIFFTOL 1.0D-1 COPYFILES perm.allow input.crd PLANCK 9.536D-14 TEMPERATURE 0.592 CYCLES 0 DIJKSTRA 0 PERMDIST CHARMM
As always, it is highly recommended that you take a look at the PATHSAMPLE documentation here and try to understand what each of the keywords are going to do. For now though, you can ignore DIJKSTRA 0 as it won't be used!
We are now ready to use the path.info file from COPTIM to create an initial PATHSAMPLE database. To do this, you need to compile PATHSAMPLE. This is really simple. Assuming you have your PATHSAMPLE source in ~/svn/PATHSAMPLE/source:
cd ~/svn/PATHSAMPLE/source make clean make
If this doesn't work, please ask for help! You should now have a PATHSAMPLE binary in ~/svn/PATHSAMPLE/bin which you might want to copy to ~/bin:
cp ~/svn/PATHSAMPLE/bin/PATHSAMPLE ~/bin
Now, you can run PATHSAMPLE like this:
PATHSAMPLE > output.setup
This should only take a second to run, and your output.setup file should end like this:
getallpaths> writing data for new ts to ts.data getallpaths> writing data for 1 new minima to min.data getallpaths> writing data for new ts to ts.data getallpaths> writing data for new ts to ts.data getallpaths> writing data for new ts to ts.data setup> The unique A and B minima are 0 and 0 respectively
You will notice that a lot of new files have been created in the directory. The important ones to note are:
- min.data - the database of minima. Each line corresponds to one minimum i.e. the first line is minmum 1. This file contains the energy, the log product of Hessian eigenvalues, and the three principle moments of inertia for each minimum.
- ts.data - the database of transition states. Very similar to min.data but for transition states. Each line also shows which minima the transition state connects in the 4th and 5th columns, for example, for transition state 1:
-141.7612296514 8815.1845668576 1 1 2 282.3763020963 1913.4378311886 2065.6351729082
you can see that it connects minimum 1 to minimum 2 (lines 1 and 2 of the min.data file).
- points.min - binary file containing the coordinates for each minimum.
- points.ts - binary file containing the coordinates for each transition state.
Getting an initial rate for your pathway
There are two other important files, min.A and min.B which we need to modify. These contain the minima we consider to be reactants (A) and products (B), as defined by the DIRECTION AB line in pathdata. If you look, these are both currently set to:
1 0
This means we haven't actually defined a reactant or product group, so we can't work out rates as PATHSAMPLE won't know where to look for paths. For this tutorial, we want min.A to contain the minimum number corresponding to the extended structure (initialmin), and min.B to contain the minimum number corresponding to the global minimum. To find out what their numbers are, we need to compare the energies from our COPTIM run, with those in the min.data file we just made, and make a note of the line number.
From COPTIM/optim.out, we know the initial energy is -143.3738462 (E+ for the first transition state). In my min.data file, this corresponds to line 17 (yours may be different!):
-143.3738461873 8872.8473781541 1 288.7833895541 1820.2221658643 1981.4192575837
So - we change the '0' in min.A, to 17:
1 17
Note that the '1' here just tells PATHSAMPLE to expect a single minimum number.
Do the same for the E- energy of the last transition state in COPTIM/optim.out In my case it matches line 10 of min.data, and so my min.B file looks like this:
1 10
Ok, before we run PATHSAMPLE to calculate the rate, we need to make a change to the pathdata file. Replace this line:
STARTFROMPATH path.info 0 0
with this:
comment STARTFROMPATH path.info 0 0
This means PATHSAMPLE will ignore this keyword, and not try to setup the database again! It will then continue to read the rest of the keywords, including DIJKSTRA 0 which you should now look up here!
Run PATHSAMPLE as before:
PATHSAMPLE > output.dijkstra
Hopefully your output.dijkstra file should end in a similar way to this:
Dijkstra> detailed balance, ratio should be one if SS applies: 0.5403657521E-01 Dijkstra> MFPT for fastest path: A<-B value= 0.3170455739E-04 B<-A value= 0.1723492293E-05 Dijkstra> rates without conditional probability: A<-B value= 31541.20676 B<-A value= 580217.2740 Dijkstra> rates with conditional probability: A<-B value= 31541.20676 B<-A value= 580217.2740 main> CPU time spent in Dijkstra = 0.15182E-01 s main> total CPU time spent in PATHSAMPLE = 0.30354E-01 s
The exact numbers you get may be different if your OPTIM run found a different connected path! And there you have it! The rate for 'folding' (initialmin -> dbase.1) is B<-A (B from A)
B<-A value= 580217.2740
which you can report as 6x10^5. The rate for the reverse, 'unfolding' is
A<-B value= 31541.20676
or 3x10^4.
You should be aware that the Dijkstra analysis only considers the 'fastest' pathway, the one which makes the largest contribution to the steady-state rate constant between A and B - although the rates we just looked at do include re-crossings along that pathway. If you wanted to include multiple pathways, and off pathway minima/traps, you should look at the NGT keyword.
Also - a word of warning. Just because we found a connected pathway, does not mean that this is the fastest pathway that exists between A and B. It is very likely that alternative pathways exist with lower barriers and/or fewer steps. Finding these pathways is a whole different topic, and will be covered by a different tutorial!
For now, congratulations! You have successfully run GMIN to identify the global minimum for a small peptide, used OPTIM to identify a discrete path between a high-energy extended/unfolded structure and the global minimum, and then used PATHSAMPLE to determine the rate for that folding process :)
If you are having problems, you can download a compressed archive of all the files in the PATHSAMPLE directory here, and compare what you have with them!