Calculating rate constants (GT and fastest path)

From CUC3
Revision as of 15:48, 2 August 2008 by import>Jmc49
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

So, what do you do with the database of stationary points when you've finished the sampling part of the process? Usually, calculate rate constants (of various flavours) and examine mechanistically discrete paths with significant rate constants (i.e., make nice movies!).

Additionally, it's often helpful as you're building up the database to monitor the various rate constants that pathsample produces, by running separate one-processor pathsample jobs as described below. For safety, there should only be one running pathsample job in a directory at a time, though, so do this on a copy or between restarts of the sampling jobs.

It's also possible to ask pathsample to calculate the GT rate constant every n cycles of a sampling run (put n as the second integer on the GT keyword line in the pathdata input file), but this can be prohibitively expensive if n is small and the database is large.

pathdata should contain the usual keywords that set up information about the system: NATOMS, PERMDIST, etc., as used in the database-generating runs. However, remember to remove or comment out the sampling commands: SHORTCUT, FREEPAIRS, etc. A common gotcha: DIRECTION AB means we're interested in paths from the B state to the A state. The same is true with the subscripts on rate constants: i.e., we're using the spectroscopists' convention. The direction can also be changed, if desired, from the sampling runs.

The following keywords must be present:

CYCLES 0
TEMPERATURE x

where x is the reduced temperature, i.e. for CHARMM, it's the energy in the prevailing units of kcal/mol corresponding to the desired temperature: 0.592 kcal/mol for 298 K. The CYCLES 0 ensures that no further sampling is performed. The temperature need not be the same as that used for the sampling runs; MAXTSENERGY y can also changed (or included) in the rate constant calculations.

The different flavours of rate constant calculation...

Find the 'fastest' path: the single discrete path that makes the largest individual contribution to the phenomenological DPS steady-state rate constant (equation 8 of Wales02) in the desired direction, by including the following keyword:

DIJKSTRA i

where minima with integer i direct connections or fewer are removed from the analysis. Dijkstra's shortest-path algorithm is used with an edge-weight formulation based on log-weighted adjacency matrix elements, as described in EvansW04. It's slightly adapted from the original to ensure that the edge weights are non-negative (see Appendix E of Semen Trygubenko's Ph.D. thesis). It's important to note that this analysis only compares the contribution to the rate constant from paths that don't revisit any of the minima: i.e., it doesn't include the effect of full recrossings of TS's, whereby a particular set of stationary points gives rise to an infinite number of paths involving only those stationary points.

This use of Dijkstra's algorithm is also distinct from that used in the CONNECT algorithm (CarrTW05), with a distance-based edge-weight formulation.

For the case where the end point states contain more than one minimum, the Dijkstra analysis is performed separately for each start min in turn, to the rest of the minima in the database, and the path between the pair with the largest contribution to the SS rate constant is taken as the overall 'fastest'. For a given pair of end-point minima, the remaining ones in the A and B states are treated as intervening.

Key output:

standard output file: search for 'Largest contribution to SS rate constant' to find the contribution to the DPS steady-state (SS) rate constant from this 'fastest' path between any start and any end minimum, and the number of TS's on it. The mean first-passage times (= the reciprocal of the rate constants) in the forwards and backwards directions from a GT calculation on just the stationary points on this path are also quoted on the 'MFPT for fastest path' line. This latter time does not impose the condition of steady state on the intervening minima. It also includes the effect of recrossings (which acts to increase the resulting rate constant from the SS contribution) but does not take account of off-path transitions (which also acts to increase the resulting rate, as the presence of off-path transitions reduces a rate constant). It's the rate constants corresponding to these GT times that are given on the 'rates without conditional probability' and 'rates with ' lines. The conditional probability bit refers to the factor p_x/p_X, where x=[a,b] and X=[A,B]: the probability of being in minimum x given that we're in state X.

On the 'detailed balance, ratio should' line the forwards and backwards GT rate constants are used to see how closely (or not!) the detailed balance relation (k_AB.p_B = k_BA.p_A) is obeyed, to assess the validity of the steady-state approximation used in the determination of the fastest path. The value quoted is k_AB.p_B/(k_BA.p_A) if DIRECTION BA is set (and vice-versa otherwise).

Sample chunk of output from a run with DIJKSTRA 1:

Dijkstra>    47099 minima with        1 connections or fewer will not be considered
Dijkstra> steps to A region converged in       47 cycles; maximum=      97 disconnected=   48089
Dijkstra> steps to B region converged in       38 cycles; maximum=      78 disconnected=   48089
Dijkstra> Best path for min        1 and any B minimum, k^SS B<-A    0.2648920258E-26
      82   27545   78123   76027   93353   93573  103810  104873  104876  108780  113457  113455  113816  111592  111596  111595  111599 113595  113540  111670  108193  107117  114410  113282  106425  106426  109207  113864  107981   92364   92361   92362   29005   81267  49822   13885   13883   13880   82260   82261   26172       1
Dijkstra> Best path between any A minimum and any B minimum:
      82   27545   78123   76027   93353   93573  103810  104873  104876  108780  113457  113455  113816  111592  111596  111595  111599  113595  113540  111670  108193  107117  114410  113282  106425  106426  109207  113864  107981   92364   92361   92362   29005   81267   49822   13885   13883   13880   82260   82261   26172       1
Dijkstra> Largest contribution to SS rate constant B<-A for any A and B is     0.2648920258E-26 for     41 transition states:
Dijkstra> Note that path is printed backwards starting with B, ending with A
                   E+                          Ets                         E-
     82     -555.5906153117   36918     -549.1427629896   27545     -550.7885223554
  27545     -550.7885223554  105580     -543.5187142799   78123     -549.6194842919
  78123     -549.6194842919  106531     -548.9845557840   76027     -551.8813095578
  76027     -551.8813095578  130259     -540.4728394131   93353     -542.4178176348
  93353     -542.4178176348  130266     -542.3606545656   93573     -542.3609660040
  93573     -542.3609660040  145394     -534.7644579433  103810     -535.6463272297
 103810     -535.6463272297  159813     -534.0226300928  104873     -534.7646896816
 104873     -534.7646896816  159646     -525.0030493670  104876     -526.2912996602
 104876     -526.2912996602  159359     -523.8936274377  108780     -530.2551921098
 108780     -530.2551921098  159972     -524.2397843902  113457     -527.3469929975
 113457     -527.3469929975  159495     -522.7281206378  113455     -522.8085028867
 113455     -522.8085028867  162260     -518.9479471804  113816     -519.9294092195
 113816     -519.9294092195  160045     -518.4098972167  111592     -520.7709917382
 111592     -520.7709917382  156953     -520.6519499996  111596     -520.6537222572
 111596     -520.6537222572  156948     -509.2607120466  111595     -510.7824074831
 111595     -510.7824074831  156952     -510.7615417725  111599     -514.0064177058
 111599     -514.0064177058  159697     -513.9535007538  113595     -513.9756163730
 113595     -513.9756163730  159695     -513.2434291865  113540     -515.2702307728
 113540     -515.2702307728  159696     -514.0369077195  111670     -516.4878259009
 111670     -516.4878259009  157039     -513.3376855362  108193     -513.4635144282
 108193     -513.4635144282  152012     -511.8132316150  107117     -513.5373413547
 107117     -513.5373413547  161138     -510.2038266180  114410     -512.1777532197
 114410     -512.1777532197  161142     -512.1643025601  113282     -512.2181171004
 113282     -512.2181171004  161140     -503.8542455138  106425     -508.1494849751
 106425     -508.1494849751  149226     -503.2919483110  106426     -506.8230208740
 106426     -506.8230208740  159225     -506.6494193252  109207     -507.4352131723
 109207     -507.4352131723  160121     -496.8941335219  113864     -497.9365618057
 113864     -497.9365618057  160122     -497.2537286432  107981     -500.2176671595
 107981     -500.2176671595  151632     -496.4669107556   92364     -497.0219772986
  92364     -497.0219772986  128174     -496.3379764022   92361     -497.0175900124
  92361     -497.0175900124  132028     -495.5050402587   92362     -495.9664182959
  92362     -495.9664182959  130702     -495.2502792770   29005     -496.1568838964
  29005     -496.1568838964  110518     -485.9257443857   81267     -490.5071327086
  81267     -490.5071327086  110523     -490.4186741312   49822     -491.3243280411
  49822     -491.3243280411   63967     -491.3031137536   13885     -491.3671659476
  13885     -491.3671659476  110345     -486.3772296810   13883     -487.1372548014
  13883     -487.1372548014   22162     -481.8341455402   13880     -483.2413610903
  13880     -483.2413610903  112061     -476.0721186686   82260     -478.9507687081
  82260     -478.9507687081  112062     -478.9115947145   82261     -479.8779631229
  82261     -479.8779631229  112063     -476.4618088629   26172     -477.8753536184
  26172     -477.8753536184  160106     -476.4919243777       1     -476.9018216251
Dijkstra> dumping minima and transition state coordinates to file redopoints, steps=    41
Dijkstra> detailed balance, ratio should be one if SS applies:      136.2100556
Dijkstra> MFPT for fastest path: A<-B value=    0.7030943368E+25 B<-A value=    0.2425804971E-08
Dijkstra> rates without conditional probability: A<-B value=    0.1422284248E-24 B<-A value=     412234294.2
Dijkstra> rates with    conditional probability: A<-B value=    0.1422284248E-24 B<-A value=     412234294.2
main> CPU time spent in Dijkstra                    =     51.940     s
main> total CPU time spent in PATHSAMPLE            =     101.49     s

Files min.[A,B,data].fastest, ts.data.fastest,points.[min,ts].fastest: a new database containing only the stationary points on the fastest path.

File Epath: numbered list of the energies of the stationary points on the fastest path, in reverse order from end to start. Feed into gnuplot or similar graph-plotting program! Note that the first column of data just contains the line number in this file; the second column holds the energies.

File redopoints: coordinates of the stationary points n reverse order along the path from end to start, in bare xyz format. Give as input to OPTIM to re-create the approximate steepest-descent paths, writing more frames of coordinates, to make smoother movies :-)

For CHARMM: file stationary.points.pdb contains the coordinates for the sequence of stationary points, from end to start, in protein data bank format. Feed into a protein structure viewer as a trajectory file.


Another important type of calculation uses the graph-transformation (GT) algorithm to obtain phenomenological rate constants (i.e., ones that can be compared to experiment :-) in both directions (no matter which DIRECTION is specified in pathdata). The reference is TrygubenkoW06. GT rate constants are different from the ones from the above Dijkstra analysis in a number of ways: 1. the steady state approximation is not used in GT 2. all possible discrete paths through a database are accounted for in the GT rate 3. as a consequence of 2., the effect of full recrossings of TS's is included in GT. So, GT rates are for a particular whole database, whereas Dijkstra rates are for a single path in a particular database.

There are 4 variations on the GT theme coded in PATHSAMPLE, labelled GT, SGT, DGT and SDGT according to their keywords. I don't have much to add to the documentation about their relative merits; I've only used GT, which is what's described below.

Set up pathdata as for a Dijkstra calculation, but instead of DIJKSTRA i put

GT i

where i has the same meaning.

Here's a sample chunk from an output file:

...
GT> Peak array size for branching ratios=        0.03897 Gb
GT> For direction A<-B:  k(A<-B)=    0.5156436361E-21 k(B<-A) (from detailed balance)=     742707728.3     total=     742707728.3
GT> For direction B<-A:  k(A<-B) (from detailed balance)=    0.1105416573E-27 k(B<-A)=     159.2187655     total=     159.2187655
GT> detailed balance, ratio should be one if SS applies:      4664699.703 
GT> non-DB rates:  k(A<-B)=    0.5156436361E-21 k(B<-A) =     159.2187655     total=     159.2187655
main> CPU time spent in GT                          =     95.600     s
main> total CPU time spent in PATHSAMPLE            =     99.250     s
...

Look for the 'non-DB rates:' line for the GT-calculated rate constants in the two possible directions.

For each direction, the GT rate constant is also used to give a rate in the other direction via the detailed-balance relationship. As in the Dijkstra calculation, the value quoted on the 'detailed balance, ratio should' line is k_AB.p_B/(k_BA.p_A), but using the full GT rate constants here. This ratio is always this way up, irrespective of what's set by the DIRECTION keyword.

No other relevant files are produced by GT.

Running rate-constant calculations on a grouped database

All of the above methods can also be applied to a database that has been grouped into sets of minima and the ensembles of transition states that connect them. Just add a line for the grouping method you want (e.g. REGROUPFREE, REGROUPPE, REGROUPRATE) to a pathdata file that contains appropriate rate-constant instructions as described above. If the value of Planck's constant, in the prevailing energy units times seconds, is not 1.0 (the default) then don't forget to set it appropriately using a PLANCK value line. For CHARMM, in units of (kcal/mol) s, use PLANCK 9.546D-14

For example:

...
CYCLES 0
PLANCK 9.546D-14
REGROUPFREE 5.0
DIJKSTRA 0
...

Any fastest paths printed will now comprise sequences of groups of individual minima and the sets of TS's that connect them, and all the individual stationary points in those groups will be written to new *.fastest database files.

Care should be taken that the parameters chosen in the grouping scheme give rise to groups where, in broad terms, the inter-group rate constants are at least an order of magnitude smaller than intra-group rates. Otherwise, equilibration within a group probably can't occur before a transition out of the group happens, and the approximation doesn't hold that the contribution to the overall rate constant from the intra-group transitions is negligible compared to the inter-group ones.

--jmc49 22:39, 9 July 2008 (BST)