Calculating rate constants (GT and fastest path)
So, what do you do with the database of stationary points when you've finished the sampling part of the process? Usually, calculate rate constants (of various flavours) and examine mechanistically discrete paths with significant rate constants (i.e., make nice movies!).
Additionally, it's often helpful as you're building up the database to monitor the various rate constants that pathsample produces, by running separate one-processor pathsample jobs as described below. For safety, there should only be one running pathsample job in a directory at a time, though, so do this on a copy or between restarts of the sampling jobs.
It's also possible to ask pathsample to calculate the GT rate constant every n cycles of a sampling run (put n as the second integer on the GT keyword line in the pathdata input file), but this can be prohibitively expensive if n is small and the database is large.
pathdata should contain the usual keywords that set up information about the system: NATOMS, PERMDIST, etc., as used in the database-generating runs. However, remember to remove or comment out the sampling commands: SHORTCUT, FREEPAIRS, etc. A common gotcha: DIRECTION AB means we're interested in paths from the B state to the A state. The same is true with the subscripts on rate constants: i.e., we're using the spectroscopists' convention. The direction can also be changed, if desired, from the sampling runs.
The following keywords must be present:
CYCLES 0 TEMPERATURE x
where x is the reduced temperature, i.e. for CHARMM, it's the energy in the prevailing units of kcal/mol corresponding to the desired temperature: 0.592 kcal/mol for 298 K. The CYCLES 0 ensures that no further sampling is performed. The temperature need not be the same as that used for the sampling runs; MAXTSENERGY y can also changed (or included) in the rate constant calculations.
The different flavours of rate constant calculation...
Find the 'fastest' path: the single discrete path that makes the largest individual contribution to the phenomenological DPS steady-state rate constant (equation 8 of Wales02) in the desired direction, by including the following keyword:
DIJKSTRA i
where minima with integer i direct connections or fewer are removed from the analysis. Dijkstra's shortest-path algorithm is used with an edge-weight formulation based on log-weighted adjacency matrix elements, as described in EvansW04. It's slightly adapted from the original to ensure that the edge weights are non-negative (see Appendix E of Semen Trygubenko's Ph.D. thesis). It's important to note that this analysis only compares the contribution to the rate constant from paths that don't revisit any of the minima: i.e., it doesn't include the effect of full recrossings of TS's, whereby a particular set of stationary points gives rise to an infinite number of paths involving only those stationary points.
This use of Dijkstra's algorithm is also distinct from that used in the CONNECT algorithm (CarrTW05), with a distance-based edge-weight formulation.
For the case where the end point states contain more than one minimum, the Dijkstra analysis is performed separately for each start min in turn, to the rest of the minima in the database, and the path between the pair with the largest contribution to the SS rate constant is taken as the overall 'fastest'. For a given pair of end-point minima, the remaining ones in the A and B states are treated as intervening.
Key output:
standard output file: search for 'Largest contribution to SS rate constant' to find the contribution to the DPS steady-state (SS) rate constant from this 'fastest' path between any start and any end minimum, and the number of TS's on it. The mean first-passage times (= the reciprocal of the rate constants) in the forwards and backwards directions from a GT calculation on just the stationary points on this path are also quoted on the 'MFPT for fastest path' line. This latter time does not impose the condition of steady state on the intervening minima. It also includes the effect of recrossings (which acts to increase the resulting rate constant from the SS contribution) but does not take account of off-path transitions (which also acts to increase the resulting rate, as the presence of off-path transitions reduces a rate constant). It's the rate constants corresponding to these GT times that are given on the 'rates without conditional probability' and 'rates with ' lines. The conditional probability bit refers to the factor p_x/p_X, where x=[a,b] and X=[A,B]: the probability of being in minimum x given that we're in state X.
On the 'detailed balance, ratio should' line the forwards and backwards GT rate constants are used to see how closely (or not!) the detailed balance relation (k_AB.p_B = k_BA.p_A) is obeyed, to assess the validity of the steady-state approximation used in the determination of the fastest path. The value quoted is k_AB.p_B/(k_BA.p_A) if DIRECTION BA is set (and vice-versa otherwise).
Sample chunk of output from a run with DIJKSTRA 1:
Dijkstra> 47099 minima with 1 connections or fewer will not be considered Dijkstra> steps to A region converged in 47 cycles; maximum= 97 disconnected= 48089 Dijkstra> steps to B region converged in 38 cycles; maximum= 78 disconnected= 48089 Dijkstra> Best path for min 1 and any B minimum, k^SS B<-A 0.2648920258E-26 82 27545 78123 76027 93353 93573 103810 104873 104876 108780 113457 113455 113816 111592 111596 111595 111599 113595 113540 111670 108193 107117 114410 113282 106425 106426 109207 113864 107981 92364 92361 92362 29005 81267 49822 13885 13883 13880 82260 82261 26172 1 Dijkstra> Best path between any A minimum and any B minimum: 82 27545 78123 76027 93353 93573 103810 104873 104876 108780 113457 113455 113816 111592 111596 111595 111599 113595 113540 111670 108193 107117 114410 113282 106425 106426 109207 113864 107981 92364 92361 92362 29005 81267 49822 13885 13883 13880 82260 82261 26172 1 Dijkstra> Largest contribution to SS rate constant B<-A for any A and B is 0.2648920258E-26 for 41 transition states: Dijkstra> Note that path is printed backwards starting with B, ending with A E+ Ets E- 82 -555.5906153117 36918 -549.1427629896 27545 -550.7885223554 27545 -550.7885223554 105580 -543.5187142799 78123 -549.6194842919 78123 -549.6194842919 106531 -548.9845557840 76027 -551.8813095578 76027 -551.8813095578 130259 -540.4728394131 93353 -542.4178176348 93353 -542.4178176348 130266 -542.3606545656 93573 -542.3609660040 93573 -542.3609660040 145394 -534.7644579433 103810 -535.6463272297 103810 -535.6463272297 159813 -534.0226300928 104873 -534.7646896816 104873 -534.7646896816 159646 -525.0030493670 104876 -526.2912996602 104876 -526.2912996602 159359 -523.8936274377 108780 -530.2551921098 108780 -530.2551921098 159972 -524.2397843902 113457 -527.3469929975 113457 -527.3469929975 159495 -522.7281206378 113455 -522.8085028867 113455 -522.8085028867 162260 -518.9479471804 113816 -519.9294092195 113816 -519.9294092195 160045 -518.4098972167 111592 -520.7709917382 111592 -520.7709917382 156953 -520.6519499996 111596 -520.6537222572 111596 -520.6537222572 156948 -509.2607120466 111595 -510.7824074831 111595 -510.7824074831 156952 -510.7615417725 111599 -514.0064177058 111599 -514.0064177058 159697 -513.9535007538 113595 -513.9756163730 113595 -513.9756163730 159695 -513.2434291865 113540 -515.2702307728 113540 -515.2702307728 159696 -514.0369077195 111670 -516.4878259009 111670 -516.4878259009 157039 -513.3376855362 108193 -513.4635144282 108193 -513.4635144282 152012 -511.8132316150 107117 -513.5373413547 107117 -513.5373413547 161138 -510.2038266180 114410 -512.1777532197 114410 -512.1777532197 161142 -512.1643025601 113282 -512.2181171004 113282 -512.2181171004 161140 -503.8542455138 106425 -508.1494849751 106425 -508.1494849751 149226 -503.2919483110 106426 -506.8230208740 106426 -506.8230208740 159225 -506.6494193252 109207 -507.4352131723 109207 -507.4352131723 160121 -496.8941335219 113864 -497.9365618057 113864 -497.9365618057 160122 -497.2537286432 107981 -500.2176671595 107981 -500.2176671595 151632 -496.4669107556 92364 -497.0219772986 92364 -497.0219772986 128174 -496.3379764022 92361 -497.0175900124 92361 -497.0175900124 132028 -495.5050402587 92362 -495.9664182959 92362 -495.9664182959 130702 -495.2502792770 29005 -496.1568838964 29005 -496.1568838964 110518 -485.9257443857 81267 -490.5071327086 81267 -490.5071327086 110523 -490.4186741312 49822 -491.3243280411 49822 -491.3243280411 63967 -491.3031137536 13885 -491.3671659476 13885 -491.3671659476 110345 -486.3772296810 13883 -487.1372548014 13883 -487.1372548014 22162 -481.8341455402 13880 -483.2413610903 13880 -483.2413610903 112061 -476.0721186686 82260 -478.9507687081 82260 -478.9507687081 112062 -478.9115947145 82261 -479.8779631229 82261 -479.8779631229 112063 -476.4618088629 26172 -477.8753536184 26172 -477.8753536184 160106 -476.4919243777 1 -476.9018216251 Dijkstra> dumping minima and transition state coordinates to file redopoints, steps= 41 Dijkstra> detailed balance, ratio should be one if SS applies: 136.2100556 Dijkstra> MFPT for fastest path: A<-B value= 0.7030943368E+25 B<-A value= 0.2425804971E-08 Dijkstra> rates without conditional probability: A<-B value= 0.1422284248E-24 B<-A value= 412234294.2 Dijkstra> rates with conditional probability: A<-B value= 0.1422284248E-24 B<-A value= 412234294.2 main> CPU time spent in Dijkstra = 51.940 s main> total CPU time spent in PATHSAMPLE = 101.49 s
Files min.[A,B,data].fastest, ts.data.fastest,points.[min,ts].fastest: a new database containing only the stationary points on the fastest path.
File Epath: numbered list of the energies of the stationary points on the fastest path, in reverse order from end to start. Feed into gnuplot or similar graph-plotting program! Note that the first column of data just contains the line number in this file; the second column holds the energies.
File redopoints: coordinates of the stationary points n reverse order along the path from end to start, in bare xyz format. Give as input to OPTIM to re-create the approximate steepest-descent paths, writing more frames of coordinates, to make smoother movies :-)
For CHARMM: file stationary.points.pdb contains the coordinates for the sequence of stationary points, from end to start, in protein data bank format. Feed into a protein structure viewer as a trajectory file.
Another important type of calculation uses the graph-transformation (GT) algorithm to obtain phenomenological rate constants (i.e., ones that can be compared to experiment :-) in both directions (no matter which DIRECTION is specified in pathdata). The reference is TrygubenkoW06. GT rate constants are different from the ones from the above Dijkstra analysis in a number of ways: 1. the steady state approximation is not used in GT 2. all possible discrete paths through a database are accounted for in the GT rate 3. as a consequence of 2., the effect of full recrossings of TS's is included in GT. So, GT rates are for a particular whole database, whereas Dijkstra rates are for a single path in a particular database.
There are 4 variations on the GT theme coded in PATHSAMPLE, labelled GT, SGT, DGT and SDGT according to their keywords. I don't have much to add to the documentation about their relative merits; I've only used GT, which is what's described below.
Set up pathdata as for a Dijkstra calculation, but instead of DIJKSTRA i put
GT i
where i has the same meaning.
Here's a sample chunk from an output file:
... GT> Peak array size for branching ratios= 0.03897 Gb GT> For direction A<-B: k(A<-B)= 0.5156436361E-21 k(B<-A) (from detailed balance)= 742707728.3 total= 742707728.3 GT> For direction B<-A: k(A<-B) (from detailed balance)= 0.1105416573E-27 k(B<-A)= 159.2187655 total= 159.2187655 GT> detailed balance, ratio should be one if SS applies: 4664699.703 GT> non-DB rates: k(A<-B)= 0.5156436361E-21 k(B<-A) = 159.2187655 total= 159.2187655 main> CPU time spent in GT = 95.600 s main> total CPU time spent in PATHSAMPLE = 99.250 s ...
Look for the 'non-DB rates:' line for the GT-calculated rate constants in the two possible directions.
For each direction, the GT rate constant is also used to give a rate in the other direction via the detailed-balance relationship. As in the Dijkstra calculation, the value quoted on the 'detailed balance, ratio should' line is k_AB.p_B/(k_BA.p_A), but using the full GT rate constants here. This ratio is always this way up, irrespective of what's set by the DIRECTION keyword.
No other relevant files are produced by GT.
Running rate-constant calculations on a grouped database
All of the above methods can also be applied to a database that has been grouped into sets of minima and the ensembles of transition states that connect them. Just add a line for the grouping method you want (e.g. REGROUPFREE, REGROUPPE, REGROUPRATE) to a pathdata file that contains appropriate rate-constant instructions as described above. If the value of Planck's constant, in the prevailing energy units times seconds, is not 1.0 (the default) then don't forget to set it appropriately using a PLANCK value line. For CHARMM, in units of (kcal/mol) s, use PLANCK 9.546D-14
For example:
... CYCLES 0 PLANCK 9.546D-14 REGROUPFREE 5.0 DIJKSTRA 0 ...
Any fastest paths printed will now comprise sequences of groups of individual minima and the sets of TS's that connect them, and all the individual stationary points in those groups will be written to new *.fastest database files.
Care should be taken that the parameters chosen in the grouping scheme give rise to groups where, in broad terms, the inter-group rate constants are at least an order of magnitude smaller than intra-group rates. Otherwise, equilibration within a group probably can't occur before a transition out of the group happens, and the approximation doesn't hold that the contribution to the overall rate constant from the intra-group transitions is negligible compared to the inter-group ones.
--jmc49 22:39, 9 July 2008 (BST)