Biomolecules in PATHSAMPLE

From Docswiki
Jump to navigation Jump to search

Overview

The use of PATHSAMPLE to explore the energy landscapes of biomolecules can be tricky. This page gives an overview for the use of AMBER12 for large biological systems. The following topics will be considered:

  • 1. Setting up the input files correctly and problems encountered on the way to good endpoints
  • 2. How to run PATHSAMPLE for large systems
  • 3. What to do after an initial path is found

Setting up

For AMBER12, as with AMBER9, you will need a coords.inpcrd file and a topology file (coords.prmtop). Additionally, you will need a perm.allow file. The scripts for this are in the svn repository.

There are a few things to be cautious about:

  • 1. You need to change the libraries in your AMBER installation with the one provided in the SVN. The changes in the provided libraries are symmetrised charges for NPHE and NLEU. The new libraries are found in the AMBERTOOLS/dat/leap/lib directory, you will need to copy them to your AMBER version (AMBER12 or newer!).
  • 2. If you cap your protein, be careful when creating the perm.allow file. NME and NMET are sometimes confused in the script if your pdb file is misaligned. This means you will have the wrong permutational groups.
  • 3. If you want to shorten a given pdb, remove all hydrogen atoms (for example using reduce -Trim your_file.pdb > your_file_nohydrogen.pdb), change the name of your terminal groups, and use tleap (or xleap) to create all H again. This means you will need slightly longer to relax your protein, but otherwise there can be conflicts in the naming process.
  • 4. For proteins you can use the newer ff14SB force field, although there are conflicting studies on whether it is better than ff99SB. It is based on ff99SB and contains the backbone improvements of later versions of ff99SB. If using ff14SB, there are two solvent models you can choose from (both supported using CUDA) - igb2 and igb8. The former uses set default PBradii mbondi2, while igb8 needs set default PBradii mbondi3 in Leap. igb8 is meant to be somewhat more accurate, but if used with DNA/RNA it can lead to "Abnormal termination" messages. If this happens, just try switching to igb2.
  • 5. If you want to mutate a protein, the easiest way forward is to remove all atoms of the residue, but for the backbone and side chain atom that are shared between the original and mutated residue, and change the residue name. Again, leap will then create the missing atoms. In xleap, the editor allows to move single atoms or atoms selections to remove potential atom clashes (see the AMBER tutorials for more information), however leap generally prevents most clashes, so that a single minimisation generally is sufficient after the mutation to get a new stable structure.
  • 6. It might be tempting to run GMIN for some time to get a large number of minima to start with. While this is a good strategy for smaller molecules, for large systems (> 1000 atoms) the current set of moves does not actually explore a lot of space, so it can happen that GMIN is running for a long time, but you are not actually sampling a lot of space (potentially find moves representing your actual system). In many cases, it might be easier to start with two or more different pdb files (for example inactive or active) and run OPTIM. ALternatively, using BHPT might also be useful to find a larger number minima to start the database with.

Running PATHSAMPLE

Again, there is nothing to different about using AMBER within PATHSAMPLE and OPTIM. A few things, however, need to be considered:

  • 1. Make sure AMBER12 is the last keyword in your odata file.
  • 2. CUDAOPTIM can only be compiled using gfortran or ifort (as of 28/09/2016). ifort might be somewhat faster, but throws a lot more bugs from the compiler.
  • 3. For large systems, LPERMDIST should be used as a global alignment of the structures might miss out local changes leading to a lot of R/S inversions in OPTIM runs.
  • 4. For large systems, calculating Hessians (and hence frequencies) while sampling is time consuming. NOHESS and NOFRQS are generally a good idea. After sampling your system, remove unconnected points, and run CHECKMIN with GETMINFRQS (or CHECKTS with GETTSFRQS) with NOHESS, ENDNUMHESS and DUMPFRQS in the odata file.
  • 5. For large systems, you might hit numerical limits. In the Dijkstra analysis, the maximum distance supported by the exponential weighting function is 700. If your distances are larger, you will need to change the weighting function.
  • 6. Large gaps will be hard to fill, especially if they are associated with very complex motions (i.e. where the linear interpolation in DNEB does not give you good images). Your helpers here are MINGAP and MAXGAP. MINGAP gives the minimum distance that should be attempted to be connected in PATHSAMPLE. This avoids sampling many small gaps in the same region over and over again and focuses on the large gaps. You will need to run longer sampling after you found an initial path. MAXGAP forces OPTIM to only use DNEB for the largest gap in the Dijkstra analysis. (Additionally, you might want to run OPTIM with DUMPBESTPATH)
  • 7. Despite all these strategies, things might go wrong. One likely problem is the disconnection of the AB set. This happens if ts.data and pairlist do not connect your A and B minimum. (You can check this with the script $SVN/SCRIPTS/PATHSAMPLE/dijkstra_test.py) If this happens, you have a variety of options: extract some minima, run OPTIM, and hope you find intermediate minima; change the number of entries in the pairlist and pairdist files; or go back to an older version of your database (which you hopefully backed up somewhere) and use DIJPRUNE to get a smaller database. (If you have no backup remove the minima step by step until you find a connection again.)
  • 8. If you start your PATHSAMPLE run from a GMIN lowest file, you can run into trouble if the coordinates are shifted far away from the origin (I have even seen this with CENTRE turned on). One easy solution to this problem is to shift the coordinates in the start file. A script for this can be found in $SVN/SCRIPTS/AMBER/AMBER_shift_com.py.

Sampling after the initial path is found

There are many things to try out once a connection is found. UNTRAP and SHORTCUT are the obvious go to options. For small systems CONNECTREGION might be worthwhile as well. However, there is a problem with these methods for large systems. UNTRAP and SHORTCUT will find many minima in interesting regions, but these might not necessarily be connected to the AB set. CONNECTUNC is your friend in this case.

There are many things that can be checked to see how the sampling progresses: the landscape entropy (Cv without frqs), energy distributions for all minima and the AB set (i.e. is your AB set representative?), the size of the connected sets within your database (you should have one large set, the AB set, and a few small sets), and the number of minima and transition states in your AB set compared to the number of minima and transition states overall.

If you are happy with your sampling and you have analysed your data, it is important to run another check: pick some important structures (important intermediates and low energy structures), solvate them explicitly, and run a short MD to confirm they are indeed stable.