Optimising a path

From Docswiki
Jump to navigation Jump to search

Once a connection has been thought to be made, it is best to verify that there is indeed a connection. Run another PATHSAMPLE calculation with DIJKSTRA 0 and CYCLES 0. A common gotcha is a too low (i.e too negative) of a MAXTSENERGY value can make an apparent connection disappear! The aim is to grow the database until one sees a plateau in the [N]GT (probably grouped) rate constant as a function of number of local minima. A plateau of around an order of magnitude is usually accurate enough.

The goal is to optimise the mean first passage time, by building a larger database that has more direct connections between the starting and ending minima. In pathdata, COMMENT any DIJINIT* keywords and add SHORTCUT <n>, where n is perhaps a quarter of the path length, or less if the path is huge. Using DUMMYRUN in pathdata can help in setting the value for <n> because a list of pairs to connect will be created etc. but no actual OPTIM jobs will be submitted. Setting too large or too small <n> will be inefficient with respect to improving the rate constants. Also remember to set CYCLES <m> to a positive value. To begin with one rate constant is likely to be much smaller than the other one. You want to optimise the rate constant as far as possible, by keeping track of the rate constants as function of the number of minima in the databases. Eventually additions of new connections to the database will no longer reduce the rate constant. Focus on the DIJKSTRA 0 path, both its length, and the MFPT that it estimates.

The primary keywords to use when improving the rate constants are SHORTCUT, SHORTCUT BARRIER, UNTRAP and FREEPAIRS (see the PATHSAMPLE documentation for an explanation of the additional parameters required with each keyword).

The two SHORTCUTs focus on the "fastest" path through the current database found with a Dijkstra analysis using the branching probability edge weight formulation. SHORTCUT should be used when the fastest known path is long and winding (after running DIJKSTRA 0, look at the movie made as in Creating movies (.mpg) of paths using OPTIM), with some hopefully superfluous transitions, because it is designed to find more direct connections between minima close in space but widely separated on the fastest path. SHORTCUT BARRIER is designed for paths with some particularly high barriers (check the Epath produced by DIJKSTRA 0), as it attempts to find alternative paths with lower barriers between a minimum on either side of each "high" barrier.

If there are artificial kinetic traps in the disconnectivity graph, i.e. long dangling branches that are close in energy to the product or global minimum but separated by high barriers, then use UNTRAP or FREEPAIRS. UNTRAP deals with potential energy stationary points and FREEPAIRS considers free energy minima and TSs (either for grouped or individual potential energy stationary points) and focuses on searching for low-barrier connections between the trapping minima and the product set.

Aggressive shortcutting of either variety may be required, but one should be careful about it introducing artificial traps, which can then be alleviated with FREEPAIRS or UNTRAP. For the FREEPAIRS calculations, focus on the NGT rate constants (how they evolve...)

Some bits of data should be tracked when performing this calculation.

0. Check what's going on structurally in the highest-barrier processes: examine the corresponding sections of the path; maybe look at a movie (of those transitions... ;-). OPTIM keywords CHECKCHIRALITY and NOCISTRANS (for both CHARMM and AMBER) should be used to stop unphysical isomerizations getting into the database in the first place.

1. Periodically make a disconnectivity graph from the database (with or without grouping as necessary) to check for introduced traps.

2. The pathsample output lines starting with Dijkstra> will provide the most information about the rates of the existing database.

3. Check the [N]GT rate constant every n cycles with a "[N]GT 0 <n>" line. Might be useful if n is chosen to be large (as a possibly expensive [N]GT calculation on a large database every cycle is not good...) Otherwise check the rate with a separate pathsample job e.g. when you're making a tree.