Calculating binding free energy using the FSA method

From Docswiki
Revision as of 17:09, 10 May 2019 by Adk44 (talk | contribs) (Created page with "The FSA approach is a method developed in the group to allow us to calculate the binding free energy for large protein-ligand systems efficiently. We achieve this by locally r...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The FSA approach is a method developed in the group to allow us to calculate the binding free energy for large protein-ligand systems efficiently. We achieve this by locally rigidifying a large part of the protein outside of the binding site. For aldose reductase, the binding free energy converged when this region included everything more than 14A from the binding site - corresponding to 80% of the protein!

If you want to dig into the details, you can find the FSA methods paper in PCCP here.

Calculating a binding free-energy using FSA has a few distinct stages. This tutorial aims to take you through each of the for the same system we used for the FSA paper, aldose reductase with the AMBER ff99SB force field in vacuum, both on the old CPU clusters and using the new GPU workstations. You can find the files used in this tutorial in the tarballs here:

  • Setting up the system
  • Sampling relevant minima using GMIN (coming soon!)
  • Calculating vibrational frequencies using OPTIM (coming soon!)
  • Calculating the binding free energy (coming soon!)

Setting up the system

In order to run an FSA calculation, we need input for:

  • the ligand
  • the protein (no ligand present, also known as the apo state)
  • the complex (ligand bound to protein, also known as the holo state)

This means separate AMBER topology files for each. Because we are rigidifying a large part of the protein, we need to use the same protein conformation to create the complex and protein topology files. The simplest way to do this is to first create parameters for the complex, and then extract the protein and ligand. For aldose reductase, we start with the PDB structure, 2INE.

Preparing the PDB file

This is one of the most time consuming parts of any simulation - setting up the initial input. It is also the most important - a mistake here will affect all of your results down the line. PDB files you use as the basis for your simulation may contain multiple conformation for some residues where one could not be resolved (look for residues called XXXA and XXXB with the same residue ID number), water molecules you don't want to include, ions - and much more.

Removing unwanted information

Because we're going to be running this PDB file through some other code to generate input, we only want it to include information required to produce the output we want - anything else simply increases the chances of something breaking! For 2INE, we first remove everything above the line for atom 1:

ATOM      1  N   SER A   2      15.234  43.677  83.072  1.00 32.54           N

The header of a PDB file does contain some useful information about how the crystal structure was generated, but nothing we need to build AMBER input. The PDB also contains the oxygen position for 313 water molecules (residue name HOH) at the end of the file. As we are going to be using an implicit solvent model (or no solvent at all) - we need to remove these lines - here is an example:

HETATM 2572  O   HOH A 401      14.330  29.955  92.542  1.00 11.39           O

Finally, we don't need any of the CONNECT information at the bottom of the PDB file, or the final MASTER record. These can be safely deleted.

Adding in termini and replacing HETATOM with ATOM

For clarity, we need to insert some TER lines to specify where the protein, NADP+ (currently called NAP) cofactor and PAC ligand start and end. This is especially important for capping amino acid sequences. There is already a TER line between the protein and cofactor, but we can safely remove everything else on that like so:

ATOM   2512  OXT PHE A 315      20.417  43.672  55.069  1.00 14.28           O
TER
HETATM 2514  PA  NAP A 316      23.443  28.173  72.387  1.00  4.73           P

We need to add TER between the cofactor and ligand, and after the ligand:

HETATM 2561  O3X NAP A 316      23.931  22.139  78.696  1.00 10.71           O
TER
HETATM 2562  C1  PAC A 317      16.556  26.662  63.481  1.00 15.43           C

and

HETATM 2571  O2  PAC A 317      16.340  27.006  62.296  1.00 17.08           O
TER
END

Finally, we change the HETATM lines to ATOM - we don't need to treat these atoms any differently when creating input, and it can confuse some codes. You can do this how you like, I suggest the following find/replace vi command:

:%s/HETATM/ATOM  /g

Checking for salt bridging CYS residues

Error creating thumbnail: Unable to save thumbnail to destination
CYS residues for aldose reductase

Some proteins have strong interchain salt bridges, S-S bonds between CYS residues. If they are present, we need to include a command to define them as being explicitly bonded when we run tleap. This includes changing their residue names. The easiest way to do this is to just load your PDB in Pymol and show sticks for the CYS residues. If two are very close together in space, with the sulphur atoms pointing straight at each other, you can assume a salt bridge:

hide all
show sticks, resn CYS

For each cysteine residue involved in a salt bridge - make a note of its residue number, and change the residue name from CYS to CYX - donating a cystine with no hydrogen bonded to sulphur. In our case, we don't have to worry about this as there are no salt bridges in aldose reductase, as you can see from the image on the right.

Specifying the protonation state of histidines

Error creating thumbnail: Unable to save thumbnail to destination

Because histidine can have variable protonation at different pHs, you need to explicitly define how you want each histidine to be protonated. You have three choices:

  • HIE: singly protonated at the epsilon position (assumed by default for all HIS residues)
  • HID: singly protonated at the delta position
  • HIP: double protonated (charged)

There are various web servers which will try to estimate this for you based on a structure - but you really need evidence from a neutron scattering experiment which shows hydrogen positions, or information from a constant pH simulation. In the absence of this, I suggest you assume each histidine residue to be singly protonated at the epsilon position. To do this, we need to replace HIS with HIE in the PDB file, for example using this vi command:

:%s/HIS/HIE/g

AMBER's tleap program we use later will actually assume all the HIS residues are HIE even if you don't modify them, but it is good to be precise!

Renaming the NADP+ cofactor

We also need to change the residue name for NADP+ from NAP to NPD to match the parameter file we will use, which is easily achieved using a vi command:

:%s/NAP/NPD/g

This prevents the reduce program adding hydrogens incorrectly.

Adding missing protein hydrogens to the protein using reduce

In most crystal structures, you will find that hydrogen atoms are missing. This is simply because their position cannot be accurately resolved from an x-ray diffraction pattern. We need to add them back in for our simulation, and the simplest way to do this when using AMBER is with the reduce program. Assuming all the above changes to 2INE.pdb have been incorporated in 2INE_beforereduce.pdb (see the tarball linked at the top):

reduce 2INE_beforereduce.pdb > 2INE_reduce.pdb

reduce will add hydrogen atoms at the equilibrium position for each standard residue as AMBER defines them. This means it will not affect the NADP+ cofactor, and will only add some of the hydrogens to the PAC ligand. We will have to add hydrogen to them separately. It will also complain that is doesn't recognise the HIE histidine residues. This isn't a problem, the hydrogens here will be added by tleap later.

Tweaking the cofactor and ligand

Error creating thumbnail: Unable to save thumbnail to destination
The PAC ligand as labelled in 2INE.pdb
Error creating thumbnail: Unable to save thumbnail to destination
The PAC ligand as labelled in 2INE.pdb after using reduce
Error creating thumbnail: Unable to save thumbnail to destination
The PAC ligand as labelled in PAC_antechamber.pdb

Because the NADP+ and PAC molecules are not part of the standard AMBER force field, we need to provide parameters for them. For the PAC ligand we obtained them using GAMESS and RESP as described in the tutorial here. For the purposes of this tutorial, you can use the PAC parameter files (PAC.prepin and PAC.frcmod) in the setup tar file here. For NADP+, we use slightly modified parameters from the AMBER Parameter Database that you can download here and here.

Just like the protein, the NADP+ cofactor and PAC ligand are lacking all or a few hydrogen atoms. We could add them manually using something like Molfacture in VMD - but this is a bit fiddly. Ideally, we'd like to get the AMBER tleap program to add them for us when we generate parameters, based on the parameter files we supply at the time. To get this to work, the order and names of the heavy atoms in the PDB must match how they appear in the parameter file (.prepin, .mol2 or .off).

Some quick notes:

  • you may rename atoms in the .pdb file or in the .prepin/.mol2/.off files
  • you may reorder atoms in the .pdb file, but NOT in the .prepin/.mol2/.off files

This is because unlike .pdb files - .prepin/.mol2/.off files contain connectivity information which would be affected by atom reordering. Looking at the atom names for the PAC ligand in the 2INE.pdb file (which involve the ' symbol - ugh!) - in this case I'm going to rename and reorder the atoms in the 2INE.pdb file to match the PAC.prepin:

PAC part of 2INE.pdb (at the bottom!) after using reduce:

...
ATOM   2562  C1  PAC A 317      16.556  26.662  63.481  1.00 15.43           C
ATOM   2563  C2  PAC A 317      17.578  25.580  63.751  1.00 15.05           C
ATOM   2564  C1' PAC A 317      17.351  24.336  62.935  1.00 22.49           C
ATOM   2565  C2' PAC A 317      18.250  23.971  61.928  1.00 24.42           C
ATOM   2566  C3' PAC A 317      17.969  22.888  61.060  1.00 24.01           C
ATOM   2567  C4' PAC A 317      16.776  22.166  61.208  1.00 25.30           C
ATOM   2568  C5' PAC A 317      15.872  22.525  62.216  1.00 25.20           C
ATOM   2569  C6' PAC A 317      16.166  23.606  63.069  1.00 23.47           C
ATOM   2570  O1  PAC A 317      15.972  27.171  64.451  1.00 12.80           O
ATOM   2571  O2  PAC A 317      16.340  27.006  62.296  1.00 17.08           O
ATOM      0  H6' PAC A 317      15.534  23.849  63.763  1.00 23.47           H   
ATOM      0  H5' PAC A 317      15.045  22.030  62.326  1.00 25.20           H   
ATOM      0  H4' PAC A 317      16.578  21.423  60.616  1.00 25.30           H   
ATOM      0  H3' PAC A 317      18.600  22.646  60.365  1.00 24.01           H   
ATOM      0  H22 PAC A 317      18.465  25.926  63.563  1.00 15.05           H   
ATOM      0  H21 PAC A 317      17.557  25.351  64.693  1.00 15.05           H   
ATOM      0  H2' PAC A 317      19.077  24.466  61.823  1.00 24.42           H   

Matching part of PAC.prepin:

   4  O1    oh    M    3   2   1     1.540   111.208   180.000 -0.607100
   5  H18   ho    E    4   3   2     0.981   131.182    88.137  0.444000
   6  C10   c     M    4   3   2     1.359    19.382    82.355  0.641100
   7  O2    o     E    6   4   3     1.224   122.939   175.332 -0.553000
   8  C4    c3    M    6   4   3     1.516   111.882    -4.690 -0.078100
   9  H11   hc    E    8   6   4     1.097   108.121   -60.082  0.087700
  10  H12   hc    E    8   6   4     1.097   108.193    55.680  0.083700
  11  C3    ca    M    8   6   4     1.491   113.500   177.737 -0.095300
  12  C5    ca    M   11   8   6     1.395   119.980   -90.008 -0.117000
  13  H13   ha    E   12  11   8     1.087   120.744     0.059  0.133000
  14  C7    ca    M   12  11   8     1.395   119.994  -179.969 -0.132000
  15  H15   ha    E   14  12  11     1.086   120.038  -179.953  0.134000
  16  C9    ca    M   14  12  11     1.395   120.009    -0.045 -0.122000
  17  H17   ha    E   16  14  12     1.086   120.006  -179.890  0.133000
  18  C8    ca    M   16  14  12     1.395   120.007     0.000 -0.128000
  19  H16   ha    E   18  16  14     1.086   120.064  -179.981  0.136000
  20  C6    ca    M   18  16  14     1.395   119.985     0.045 -0.116000
  21  H14   ha    E   20  18  16     1.087   119.299  -179.859  0.155000

Before we can start renaming and reordering atoms, we need to know which atom in the PAC.prepin file corresponds to which in the 2INE.pdb file. To find out, we can make a .pdb file from the PAC.prepin using an AMBER program called antechamber:

antechamber -fi prepi -i PAC.prepin -fo pdb -o PAC_antechamber.pdb

You don't need to do this if you have an existing PAC .pdb file, say from generating the parameters in the first place. We now compare the two .pdb files, 2INE.pdb and PAC_antechamber.pdb using Pymol, turning the atom labels on - and match the atom names.

Name in PAC.prepin Name in 2INE.pdb
O1 O1
H18 Missing
C10 C1
O2 O2
C4 C2
H11 H22
H12 H21
C3 C1'
C5 C6'
H13 H6'
C7 C5'
H15 H5'
C9 C4'
H17 H4'
C8 C3'
H16 H3'
C6 C2'
H14 H2'

This assignment isn't entirely unambiguous because of the symmetry of the small PAC ligand. It shouldn't matter though in this case. Don't worry about the atom numbers being wrong, tleap will fix them for us in a bit.

PAC part of 2INE.pdb after renaming and reordering to match PAC.prepin:

ATOM   2570  O1  PAC A 317      15.972  27.171  64.451  1.00 12.80           O
ATOM   2562  C10 PAC A 317      16.556  26.662  63.481  1.00 15.43           C
ATOM   2571  O2  PAC A 317      16.340  27.006  62.296  1.00 17.08           O
ATOM   2563  C4  PAC A 317      17.578  25.580  63.751  1.00 15.05           C
ATOM      0  H11 PAC A 317      18.465  25.926  63.563  1.00 15.05           H 
ATOM      0  H12 PAC A 317      17.557  25.351  64.693  1.00 15.05           H
ATOM   2564  C3  PAC A 317      17.351  24.336  62.935  1.00 22.49           C
ATOM   2569  C5  PAC A 317      16.166  23.606  63.069  1.00 23.47           C
ATOM      0  H13 PAC A 317      15.534  23.849  63.763  1.00 23.47           H 
ATOM   2568  C7  PAC A 317      15.872  22.525  62.216  1.00 25.20           C
ATOM      0  H15 PAC A 317      15.045  22.030  62.326  1.00 25.20           H
ATOM   2567  C9  PAC A 317      16.776  22.166  61.208  1.00 25.30           C
ATOM      0  H17 PAC A 317      16.578  21.423  60.616  1.00 25.30           H 
ATOM   2566  C8  PAC A 317      17.969  22.888  61.060  1.00 24.01           C
ATOM      0  H16 PAC A 317      18.600  22.646  60.365  1.00 24.01           H  
ATOM   2565  C6  PAC A 317      18.250  23.971  61.928  1.00 24.42           C
ATOM      0  H14 PAC A 317      19.077  24.466  61.823  1.00 24.42           H   
TER
END

The final missing H18 atom will be added by tleap later. We need to do something similar for the NADP+ cofactor. In this case, we are going to rename the atoms in the nadp+.prep (.prep is the same as .prepin) to match those in 2INE.pdb. We then reorder the atoms in 2INE.pdb to match nadp+.prep. As before, we need to match the atom names in two files using antechamber to generate a .pdb from the NADP+ parameters, and Pymol to visualise and compare atom positions and names.

You can find the results of all these changes in the file 2INE_tweaked.pdb (where I have also removed all the extra output from reduce) and nadp+_new.prep within the tutorial tarball. Great! Now we're ready to make some initial AMBER input using tleap and see if it'll minimise to a sensible energy.

Creating AMBER parameters for the complex

We are now ready to make AMBER parameters for the initial complex geometry. For this tutorial, we are going to be using the AMBER ff99SB force field for the protein, a combination of ff99SB and custom parameters for the NADP+ cofactor and the General Amber Force Field (GAFF) with RESP charges from GAMESS-US for the PAC ligand. We will be using a set of Born radii suitable for the igb=2 implicit solvent model. For more information, see the AMBER manual.

Generating initial input using tleap

In order to combine all of these parameters, we use the AMBER program tleap. Here is the input file to create the complex, commands can be found in the AMBER Tools manual if you are curious or confused!

WARNING: if you are using AMBER tools 14, you need to change 'source leaprc.ff99SB' to 'source oldff/leaprc.ff99SB' in the file below

initial_complex.tleap:

# tleap input to generate initial aldose reductase + PAC ligand parameters

# Set appropriate Born radii values for igb=2 implicit solvent
set default PBradii mbondi2

# Load the AMBER ff99SB and GAFF forcefields
source leaprc.ff99SB
source leaprc.gaff

# Load the AMBER parameters for the NADP+ cofactor
loadamberprep nadp+_new.prep
loadamberparams nad.frcmod

# Load the AMBER parameters for the PAC ligand
loadamberprep PAC.prepin
loadamberparams PAC.frcmod

# Load 2INE_tweaked.pdb into the 'complex' object
# Hydrogens are added at this stage
complex=loadpdb 2INE_tweaked.pdb

# Check there are no missing parameters
check complex

# Save a .pdb file for the complex
savepdb complex 2INE_tleap.pdb

# Save initial AMBER topology and coordinate files
saveamberparm complex 2INE.prmtop_presym 2INE_initial.inpcrd

# Quit tleap
quit

To run tleap, we use the following command:

tleap -f initial_complex.tleap

The output files from tleap are:

  • 2INE.prmtop_presym: the AMBER topology file, containing all the force field parameters for the complex. The improper dihedrals have not yet been symmetrised - see below.
  • 2INE_initial.inpcrd: the AMBER coordinate file for the initial complex geometry before it has been relaxed.
  • 2INE_tleap.pdb: the .pdb file for the same coordinates. Worth a look in Pymol to make sure it is sensible.
  • leap.log: The tleap log file. Check to make sure no serious errors occurred. Investigate if something went wrong.

Symmetrising the topology file

The AMBER and CHARMM force fields are not symmetrised with respect to permutational isomerisation. This is particularly troublesome for our methods where we rely on the energy to discriminate between minima. Without proper symmetrisation, permutational isomers can have different energies, resulting in 'twinning' of minima. You can read our paper about this here. We can symmetrise the topology file made in the last step using a Python script from the SVN repository. Before we do that, we need to identify the terminal residues in the topology file by appending a 'N' or 'C' to the residue name for N and C terminal residues respectively.

Open the topology file and look for the residue label block:

%FLAG RESIDUE_LABEL
%FORMAT(20a4)
SER ARG LEU LEU LEU ASN ASN GLY ALA LYS MET PRO ILE LEU GLY LEU GLY THR TRP LYS
SER PRO PRO GLY GLN VAL THR GLU ALA VAL LYS VAL ALA ILE ASP VAL GLY TYR ARG HIE
ILE ASP CYS ALA HIE VAL TYR GLN ASN GLU ASN GLU VAL GLY VAL ALA ILE GLN GLU LYS
LEU ARG GLU GLN VAL VAL LYS ARG GLU GLU LEU PHE ILE VAL SER LYS LEU TRP CYS THR
TYR HIE GLU LYS GLY LEU VAL LYS GLY ALA CYS GLN LYS THR LEU SER ASP LEU LYS LEU
ASP TYR LEU ASP LEU TYR LEU ILE HIE TRP PRO THR GLY PHE LYS PRO GLY LYS GLU PHE
PHE PRO LEU ASP GLU SER GLY ASN VAL VAL PRO SER ASP THR ASN ILE LEU ASP THR TRP
ALA ALA MET GLU GLU LEU VAL ASP GLU GLY LEU VAL LYS ALA ILE GLY ILE SER ASN PHE
ASN HIE LEU GLN VAL GLU MET ILE LEU ASN LYS PRO GLY LEU LYS TYR LYS PRO ALA VAL
ASN GLN ILE GLU CYS HIE PRO TYR LEU THR GLN GLU LYS LEU ILE GLN TYR CYS GLN SER
LYS GLY ILE VAL VAL THR ALA TYR SER PRO LEU GLY SER PRO ASP ARG PRO TRP ALA LYS
PRO GLU ASP PRO SER LEU LEU GLU ASP PRO ARG ILE LYS ALA ILE ALA ALA LYS HIE ASN
LYS THR THR ALA GLN VAL LEU ILE ARG PHE PRO MET GLN ARG ASN LEU VAL VAL ILE PRO
LYS SER VAL THR PRO GLU ARG ILE ALA GLU ASN PHE LYS VAL PHE ASP PHE GLU LEU SER
SER GLN ASP MET THR THR LEU LEU SER TYR ASN ARG ASN TRP ARG VAL CYS ALA LEU LEU
SER CYS THR SER HIE LYS ASP TYR PRO PHE HIE GLU GLU PHE NPD PAC
%FLAG RESIDUE_POINTER

Here you can see the residues in the complex. The protein first, then the NADP+ cofactor (NPD) and the PAC ligand. Because aldose reductase is a single protein segment (one continuous chain), there is a single N and C terminus. To communicate this to the Python script were about to use, we need to change the residue names to NXXX and CYYY like this:

%FLAG RESIDUE_LABEL
%FORMAT(20a4)
NSERARG LEU LEU LEU ASN ASN GLY ALA LYS MET PRO ILE LEU GLY LEU GLY THR TRP LYS 
SER PRO PRO GLY GLN VAL THR GLU ALA VAL LYS VAL ALA ILE ASP VAL GLY TYR ARG HIE
ILE ASP CYS ALA HIE VAL TYR GLN ASN GLU ASN GLU VAL GLY VAL ALA ILE GLN GLU LYS
LEU ARG GLU GLN VAL VAL LYS ARG GLU GLU LEU PHE ILE VAL SER LYS LEU TRP CYS THR
TYR HIE GLU LYS GLY LEU VAL LYS GLY ALA CYS GLN LYS THR LEU SER ASP LEU LYS LEU
ASP TYR LEU ASP LEU TYR LEU ILE HIE TRP PRO THR GLY PHE LYS PRO GLY LYS GLU PHE
PHE PRO LEU ASP GLU SER GLY ASN VAL VAL PRO SER ASP THR ASN ILE LEU ASP THR TRP
ALA ALA MET GLU GLU LEU VAL ASP GLU GLY LEU VAL LYS ALA ILE GLY ILE SER ASN PHE
ASN HIE LEU GLN VAL GLU MET ILE LEU ASN LYS PRO GLY LEU LYS TYR LYS PRO ALA VAL
ASN GLN ILE GLU CYS HIE PRO TYR LEU THR GLN GLU LYS LEU ILE GLN TYR CYS GLN SER
LYS GLY ILE VAL VAL THR ALA TYR SER PRO LEU GLY SER PRO ASP ARG PRO TRP ALA LYS
PRO GLU ASP PRO SER LEU LEU GLU ASP PRO ARG ILE LYS ALA ILE ALA ALA LYS HIE ASN
LYS THR THR ALA GLN VAL LEU ILE ARG PHE PRO MET GLN ARG ASN LEU VAL VAL ILE PRO
LYS SER VAL THR PRO GLU ARG ILE ALA GLU ASN PHE LYS VAL PHE ASP PHE GLU LEU SER
SER GLN ASP MET THR THR LEU LEU SER TYR ASN ARG ASN TRP ARG VAL CYS ALA LEU LEU
SER CYS THR SER HIE LYS ASP TYR PRO PHE HIE GLU GLU CPHENPD PAC 
%FLAG RESIDUE_POINTER

Note that we are not retaining spaces as the topology file is read as a fixed format - each residue name is up to 4 characters. Assuming you have a copy of the SVN repository in your home directory, we can run the Python script as follows:

~/svn/SCRIPTS/AMBER/symmetrise_prmtop/perm-prmtop.ff03.py 2INE.prmtop_presym 2INE.prmtop

Note that although the script implies we should use it for the ff03 force field, the atom naming and ordering is identical between ff03 and ff99SB - so it works for both. This produces the symmetrised topology file 2INE.prmtop. Finally, we undo the changes we made to the residue names in this new topology file to prevent them showing up in our output.

Before we are ready to start sampling minima for the FSA procedure itself, we will need to relax the system to make sure we have removed steric clashes introduced by our addition of hydrogen to the system. We can do this by minimising the complex with OPTIM.

Relaxing the initial complex geometry

Error creating thumbnail: Unable to save thumbnail to destination
The initial and relaxed geometries compared in Pymol

We are going to be minimising the complex using OPTIM following the tutorial here. If you have not compiled OPTIM with AMBER9 before, you can find instructions on the group CMake page here.

In brief, the odata (OPTIM data) file we will use is as follows:

odata

DUMPSTRUCTURES
BFGSMIN 1.0D-6
BFGSSTEPS 10000
UPDATES 1000
MAXERISE 1.0D-4
AMBER9 coords.inpcrd inpcrd

Here is a quick summary of each keyword, more can be found in the OPTIM documentation:

  • DUMPSTRUCTURES: produce AMBER restart (.rst), PDB (.pdb) and XYZ (.xyz) files for the final, minimised geometry.
  • BFGSMIN 1.0D-6: continue to minimise until the RMS force is below 1.0D-6.
  • BFGSSTEPS 10000: do a maximum of 10000 minimisation steps.
  • UPDATES 1000: retain the previous 1000 configuration in memory for use in determining each subsequent minimisation step.
  • MAXERISE 1.0D-4: allow the energy to rise by up to 1.0D-4 kcal/mol in a single minimisation step, otherwise reduce the step size.
  • AMBER9 coords.inpcrd inpcrd: Use the AMBER9 potential, starting from the input file coords.inpcrd in inpcrd format.

A9OPTIM expects the input files coords.prmtop, the topology file containing all the force field parameters required to calculate the energy for any given conformation of atoms and coords.inpcrd, the initial coordinate to start the minimisation from. We have already produced both these files above, but have named them 2INE.prmtop and 2INE_initial.inpcrd respectively. We can simply copy these files to ensure OPTIM is able to use them:

cp 2INE.prmtop coords.prmtop
cp 2INE_initial.inpcrd coords.inpcrd

The final OPTIM input file we need is min.in which specifies some AMBER potential options, such as which implicit solvent model we want to use. Here is the one we shall be using:

min.in

Minimization
&cntrl
  imin=1, maxcyc=1, ncyc=1,
  igb=2, saltcon=0.2,
  ntx=1, ntb=0,
  cut=999.99, rgbmax=12.0, ifswitch=1
/

Full details of these keywords (apart from ifswitch=1 - see below) can be found in the AMBER manual. The key ones to understand are:

  • igb=2: specifies the use of a modified Generalised Born solvent model which attempts to account for interstitial spaces.
  • ifswitch=1: a keyword we inserted into AMBER to allow a smooth switching of the implicit solvent terms at the non-bonded cutoff.
  • cut=999.9: specified the cutoff for non-bonded interactions should be 999.99A. This in effect means no cutoff at all. Why we are using 999.9A is discussed below.

Now we have the necessary files ready, we can run A9OPTIM as follows (assuming you moved it to your /home/CRSID/bin directory):

A9OPTIM > optim.out &
tail -f optim.out

We are running A0OPTIM in the background (&), sending the output to a file called optim.out. Using tail -f, we are able to follow this file being created and watch the minimisation proceed. If things don't work, make sure you show someone this file. The minimisation will take some time so don't just sit and watch it - come back later. When the minimisation has finished, you should hopefully see something like this at the bottom of optim.out:

 
...
mylbfgs> Energy and RMS force=    -12106.38089        0.9384470210E-06 after   5990 steps, step:  0.21888E-06
 mylbfgs> Final energy is   -12106.380888338343
 rotcon> Rotational constants (in cm-1) and principal moments of inertia:
        0.00001         0.00001         0.00001   7107902.94327   8718515.11211  10189130.86838
 symmetry> The full molecular point group is C1  .
 symmetry> The largest Abelian subgroup of the full molecular point group is C1  .
 symmetry> Distance tolerance=     0.00010 Inertia tolerance=     0.00010
 symmetry> Order of full point group=     1

 geopt>                          **** CONVERGED ****


 Elapsed time=                             10594.44
 OPTIM> # of energy calls=                          0 time=           0.00 %=  0.0
 OPTIM> # of energy+gradient calls=              6004 time=       10198.12 %= 96.3
 OPTIM> # of energy+gradient+Hessian calls=         0 time=           0.00 %=  0.0

The final geometry is contained in the three output files, points.final.xyz/rst/pdb. It's always a good idea to compare this relaxed structure to the initial geometry (2INE_tleap.pdb) using Pymol and check what has changed.

The points.final.rst file is actually in the same AMBER restart format as the the 2INE_initial.inpcrd file we created with tleap and copied to coords.inpcrd to use in the OPTIM minimisation. This means that we can use points.final.rst in the same way for future GMIN and OPTIM jobs.

Defining the rigid region and local rigid bodies for the complex

Error creating thumbnail: Unable to save thumbnail to destination
View into the aldose reductase binding side before rigidification
Error creating thumbnail: Unable to save thumbnail to destination
View into the aldose reductase binding side after rigidification

In the FSA scheme, we treat a large part of the protein distant from the ligand binding site as a single rigid body. We do this within the generalised rigid body framework implemented in GMIN and OPTIM, specified by the RIGIDINIT keyword. When looking at ligand binding, the critical decision is at what distance from the ligand to start this rigidifying. We want to rigidify as much of the system as possible (fewer degrees of freedom = less time for minimisation/normal mode analysis) without affecting ligand binding, and what is safe will vary from system to system.

If you are using the tutorial for a system other than aldose reductase, I suggest you spend some time considering this. For now - we are going to use the smallest distance where we saw convergence of the binding energy - 14A. In the FSA paper, we also created a 'locally rigid' buffer region from R (14A) to R+1 or (15A). We will follow this same protocol. To specify rigid bodies for both GMIN and OPTIM we need to create two more files:

  • rbodyconfig: defines each rigid body by listing its constituent atoms.
  • coordsrigidinit: contains coordinates (x, y, z only for each atom, one atom per line) that are used to define the internal geometry of the rigid bodies specified in rbodyconfig - this is our reference structure so it is important that it is physical.

We could create these manually, but happily we have a relatively simple Fortran code to do it for us. You can find it here:

~/svn/SCRIPTS/AMBER/rigidbody/groupRigidBodyINC.f90

Alternatively there is a python scriptgroupRigidBodyINC.f90, that will also do the job, offer us more options and creates two additional output files: rigid.log - a log file to check at any time what was rigidified and parmed_in - an input file for parmed.py to turn off interactions within the rigid body. The script can be found here:

~/svn/SCRIPTS/AMBER/rigidbody/genrigid_input.py

A guide of how to use it is found here.

The fortran code requires two files to generate rigid body input for GMIN and OPTIM:

  • minimisedstartingpoint.pdb: a pdb file used to generate the reference coordinates for the rigid bodies
  • coords.inpcrd: a matching AMBER inpcrd file for the PDB file above which provides more accurate coordinates

We are going to be using the relaxed structure we got from minimising in OPTIM for the reference geometry, so we can generate these files by simply using the output files as follows:

mkdir setrigidbodies
cp points.final.rst rigidbodies/coords.inpcrd
cp points.final.pdb rigidbodies/minimisedstartingpoint.pdb
cd setrigidbodies

Before we can use this program, we need to make a change to fit our input. Copy the source code into the working directory and open it for editing:

cp ~/svn/SCRIPTS/AMBER/rigidbody/groupRigidBodyINC.f90 .
vi groupRigidBodyINC.f90

The program can read three slightly different .pdb formats, that from the ambpdb program, from GMIN or OPTIM or in a strict, fixed format. As we are using OPTIM output, we need to make sure that the corresponding READ line is the only one not commented (not preceded by a !) as follows:

! ACTION REQUIRED: MAKE SURE ONLY A SINGLE READ LINE IN THE BLOCK BELOW IS UNCOMMENTED
! PDB format 1: from ambpdb (make sure you delete the REMARK line from the top before running this)
!ATOM      1  N   TRP     1       3.613   2.061  -0.003  1.00  0.00           N
!    READ(1,*,IOSTAT=iostatus) RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1), RESIDUENO(J1), X(J1), Y(J1), Z(J1), DUMMYDP, DUMMYDP

! PDB format 2: from GMIN or OPTIM (includes a chain ID e.g. A in example below)
!ATOM      1 N    TRP A   1       3.613   2.061  -0.003  1.00   1.00 N
     READ(1,*,IOSTAT=iostatus) RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1), DUMMY1, RESIDUENO(J1), X(J1), Y(J1), Z(J1), DUMMYDP, DUMMYDP, DUMMY2

! PDB format 3: fixed format PDB files (deal with fixed formatting of pdb files - Mark Oakley)
!     READ(1,'(A6,I5,X,A4,X,A3,2X,I4,4X,3F8.3)',IOSTAT=iostatus) &
!     RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1),  RESIDUENO(J1), X(J1), Y(J1), Z(J1)

Save your changes (:wq) and open the coords.inpcrd file - we also need to make sure that the 'title' (first) line of the coords.inpcrd file is blank (which it will not be if you used OPTIM):

vi coords.inpcrd
<ENSURE THE 1ST LINE IS BLANK AND THE 2ND IS THE NUMBER OF ATOMS>
:wq

We are now ready to compile and run the rigid body program:

pgf90 -o groupRigidBodyINC groupRigidBodyINC.f90
./groupRigidBodyINC

We have a lot of choices to make here! For the purposes of this tutorial, we will be following what was done for the FSA paper, but obviously you should do what is appropriate for your system and your purposes. The first decision we need to make is which atom to use as the centre point for our rigid body definitions. In our case, we want to use something in the ligand binding pocket - like the PAC ligand. We will use the C3 atom of the ligand, index 5121 (you might want to check this!). We then need to specify the different regions by a radius from this atom:

  • the radius below which we don't want to group anything into rigid bodies = 14A
  • the radius above which we want to group everything into a single rigid body = 15A (this gives us a 1A 'buffer' region where we can use 'local' rigid bodies e.g. rigidifying sp2 centres)

We then need to make some decisions about which local rigid bodies to set up. Here is the sequence used for this tutorial:

 Should I group PEPTIDE BOND C-O-N-H as a rigid body? (Y/N)
Y
 Should I group PROLINE N-CD-CG-CB-CA as a rigid body? (Y/N)
N
 Should I group ARGININE NE-HE-CZ-NH1-HH11-HH12-NH2-HH21-HH22 as a rigid body? (Y/N)
Y
 Should I group HISTIDINE (HIS, HIE, HID, HIP) CG-ND1-CE1-NE2-CD2 as a rigid body? (Y/N)
Y
 Should I group LYSINE NZ-HZ1-HZ2-HZ3 as a rigid body? (Y/N)
Y
 Should I group ASPARAGINE CB-CG-OD1-OD2 as a rigid body? (Y/N)
Y
 Should I group GLUTAMIC ACID CG-CD-OE1-OE2 as a rigid body? (Y/N)
Y
 Should I group ASPARAGINE CB-CG-OD1-ND2-HD21-HD22 as a rigid body? (Y/N)
Y
 Should I group GLUTAMINE CG-CD-OE1-NE2-HE21-HE22 as a rigid body? (Y/N)
Y
 Should I group PHENYLALANINE CG-CD1-HD1-CE1-HE1-CZ-HZ-CE2-HE2-CD2-HD2 as a rigid body? (Y/N)
Y
 Should I group TYROSINE CG-CD1-HD1-CE1-HE1-CZ-OH-CE2-HE2-CD2-HD2 as a rigid body? (Y/N)
Y
 Should I group TRYPTOPHAN as a 1 (CG-CD1-HD1-NE1-HE1-CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2) or 2 (CG-CD1-HD1-NE1-HE1)-(CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2) rigid bodies? (1/2/N)
2
 ANYTHING ELSE TO GROUP? (Y/N)
 WRITE IN USRRIGID FILE IN THE FOLLOWING FORMAT:
 TYR 6
 CG
 CD1
 CE1
 CZ
 CE2
 CD2
N
 Should I group all atoms outside RADIUSALL a rigid body? (Y/N)
Y

There are pros and cons to rigidifying things like this, for example, by fixing the peptide bonds - you are excluding some conformation where this angle is required to slightly change. A good example of this is rigidifying the TRP rings. If you rigidify them as a single rigid body - you can actually remove the global minimum as the rings are slightly bent. Hence we need to rigidify them as two separate rigid bodies. See the rigid body paper for more information on how safe or not these options are.

We are now able to specify some further options which will produce GROUPROTATION input for GMIN to allow us to change the conformation of the sidechains in the binding pocket. For more info on GROUPROTATION, see the GMIN documentation.

 What is the uniform probability you want to choose? (0.01-0.1 seems sensible)
0.025
 Do you want group rotations of the side chain with N-CA as the axis? (Y/N)
 FYI I am not doing anything for PRO
N
 Do you want group rotations of the side chain with C-CA as the axis? (Y/N)
 FYI I am not doing anything for PRO
N
 Do you want group rotations of the side chain with CA-CB as the axis? (Y/N)
 FYI I am not doing anything for ALA, GLY, PRO
Y
 Do you want to use the default values? (Y/N)
Y
 For which regions do you want me to generate the GROUPROTATION FILE?
 (1) JUST THE ALL ATOM REGION
 (2) JUST THE MINIMAL RIGID BODY REGION
 (3) BOTH REGIONS
3
 Do you want group rotations of the side chain with CB-CG as the axis? (Y/N)
 FYI I am not doing anything for ALA, GLY, PRO, SER, CYS, THR, VAL
Y
 Do you want to use the default values? (Y/N)
Y
 For which regions do you want me to generate the GROUPROTATION FILE?
 (1) JUST THE ALL ATOM REGION
 (2) JUST THE MINIMAL RIGID BODY REGION
 (3) BOTH REGIONS
3
 Do you want group rotations of the side chain with CG-CD as the axis? (Y/N)
 FYI I am not doing anything for all residues except for ARG, LYS, GLU, GLN
Y
 Do you want to use the default values? (Y/N)
Y
 For which regions do you want me to generate the GROUPROTATION FILE?
 (1) JUST THE ALL ATOM REGION
 (2) JUST THE MINIMAL RIGID BODY REGION
 (3) BOTH REGIONS
3
 ANYTHING ELSE? (Y/N)
 WRITE IN AUTOGR FILE IN THE FOLLOWING FORMAT:
 RESIDUENAME NO_MEMBER AMP PROB
 AXIS1
 AXIS2
 MEMBER1
 MEMBER2
 MEMBER3, ETC
N

You should now have three new files in your directory - rbodyconfig and coordsinirigid are described above. The rigid bodies defined here are shown in the image above where grey signifies the one, large rigid body - blue shows local rigid bodies and green is fully atomistic. The third file atomgroups is the GROUPROTATION input file whose format is described in the GMIN documentation. While we're talking about GROUPROTATION - we would like to add a few rotation groups for the ligand, as only groups for the protein sidechains are created by the program automatically. Open the atomgroups file and add the following at the top:

GROUP PACRING 5118 5121 10 1.0 0.025
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
GROUP PACCHAIN1 5121 5118 6 1.0 0.025
5120
5119
5117
5116
5115
5114
GROUP PACCHAIN2 5118 5116 3 1.0 0.025
5117
5115
5114

Great! We're done making input for the complex and can now move on to the protein which is simply the complex without the PAC ligand.

Relaxing the now rigidified complex

Although we don't expect any structural change to occur if we minimise after rigidifying (as the reference structure for the rigid bodies matches the input structure) - it is still worth doing as a sanity check, and to make sure you are definitely starting GMIN sampling from a minimum. Relaxing the rigidified complex is exactly the same as for the fully atomistic system, with a single change to the odata (OPTIM data) file, including the RIGIDINIT keyword:

odata

RIGIDINIT
DUMPSTRUCTURES
BFGSMIN 1.0D-6
BFGSSTEPS 10000
UPDATES 1000
MAXERISE 1.0D-4
AMBER9 coords.inpcrd inpcrd

RIGIDINIT specifies that we should use rigid bodies as defined in the rbodyconfig and coordsinirigid files that we created above above. You will probably still see a few minimisation steps occuring, but if the energy changes significantly, you have a problem. The end of the OPTIM output should look something like this:

 mylbfgs> Energy and RMS force=    -12106.38089        0.8960252262E-06 after    114 steps, step:  0.17444E-07
 mylbfgs> Final energy is   -12106.380888335800    
 rotcon> Rotational constants (in cm-1) and principal moments of inertia: 
        0.00001         0.00001         0.00001   7107902.93874   8718515.10946  10189130.86900
 symmetry> The full molecular point group is C1  .
 symmetry> The largest Abelian subgroup of the full molecular point group is C1  .
 symmetry> Distance tolerance=     0.00010 Inertia tolerance=     0.00010
 symmetry> Order of full point group=     1
 
 geopt>                          **** CONVERGED ****
 
 
 Elapsed time=                               198.53
 OPTIM> # of energy calls=                          0 time=           0.00 %=  0.0
 OPTIM> # of energy+gradient calls=               115 time=         197.38 %= 99.4
 OPTIM> # of energy+gradient+Hessian calls=         0 time=           0.00 %=  0.0

As before, a points.final.rst file will be produced - this is going to be our starting point for the complex for all work that follows.

Creating protein and ligand parameters

Now we have working input for the complex, we can easily make protein and ligand input by chopping up the output .pdb file from OPTIM and re-running tleap as before. Happily, we don't have to repeat a lot of the initial steps tweaking the .pdb as we did for the complex - all that work carries right over! :)

First, lets copy the OPTIM points.final.pdb file for the complex into two new files which we can edit:

cp points.final.pdb 2INE_protein.pdb
cp points.final.pdb 2INE_ligand.pdb

Chopping up the .pdb files

We have a simple edit to make to each .pdb file before we can use them to generate AMBER input. We now want to remove the PAC ligand atoms from 2INE_protein.pdb and everything but the PAC ligand atoms from 2INE_ligand.pdb which will end up looking like this:

ATOM   5114 O1   PAC A 316      17.997  27.868  62.311  1.00   1.00 O
ATOM   5115 H18  PAC A 316      17.241  28.485  62.467  1.00   1.00 H
ATOM   5116 C10  PAC A 316      17.368  26.821  61.891  1.00   1.00 C
ATOM   5117 O2   PAC A 316      16.187  26.901  61.636  1.00   1.00 O
ATOM   5118 C4   PAC A 316      18.257  25.611  61.715  1.00   1.00 C
ATOM   5119 H11  PAC A 316      18.787  25.731  60.769  1.00   1.00 H
ATOM   5120 H12  PAC A 316      19.011  25.624  62.504  1.00   1.00 H
ATOM   5121 C3   PAC A 316      17.536  24.267  61.765  1.00   1.00 C
ATOM   5122 C5   PAC A 316      16.667  23.962  62.816  1.00   1.00 C
ATOM   5123 H13  PAC A 316      16.474  24.694  63.598  1.00   1.00 H
ATOM   5124 C7   PAC A 316      16.052  22.712  62.883  1.00   1.00 C
ATOM   5125 H15  PAC A 316      15.376  22.482  63.704  1.00   1.00 H
ATOM   5126 C9   PAC A 316      16.313  21.753  61.907  1.00   1.00 C
ATOM   5127 H17  PAC A 316      15.838  20.774  61.962  1.00   1.00 H
ATOM   5128 C8   PAC A 316      17.188  22.046  60.863  1.00   1.00 C
ATOM   5129 H16  PAC A 316      17.399  21.295  60.103  1.00   1.00 H
ATOM   5130 C6   PAC A 316      17.794  23.300  60.790  1.00   1.00 C
ATOM   5131 H14  PAC A 316      18.479  23.515  59.971  1.00   1.00 H

Again, we don't need to worry about the atom numbering being wrong, tleap will fix that for us below. Thebottom of 2INE_protein.pdb should look like this:

...
ATOM   5108 H61A NPD A 315      26.154  28.390  81.535  1.00   1.00 H
ATOM   5109 N1A  NPD A 315      28.535  26.552  80.565  1.00   1.00 N
ATOM   5110 C2A  NPD A 315      28.864  25.708  79.594  1.00   1.00 C
ATOM   5111 H2A  NPD A 315      29.875  25.322  79.625  1.00   1.00 H
ATOM   5112 N3A  NPD A 315      28.106  25.268  78.592  1.00   1.00 N
ATOM   5113 C4A  NPD A 315      26.849  25.803  78.624  1.00   1.00 C

Generating initial protein and ligand input using tleap

Now that we have these input .pdb files ready - we can create initial AMBER input for both the protein at the same time by using the following tleap input file.

initial_protein_and_ligand.tleap:

# tleap input to generate initial aldose reductase parameters
# for the protein and PAC ligand separately  

# Set appropriate Born radii values for igb=2 implicit solvent
set default PBradii mbondi2

# Load the AMBER ff99SB and GAFF forcefields
source leaprc.ff99SB
source leaprc.gaff

# Load the AMBER parameters for the NADP+ cofactor
loadamberprep nadp+_new.prep
loadamberparams nad.frcmod

# Load the AMBER parameters for the PAC ligand
loadamberprep PAC.prepin
loadamberparams PAC.frcmod

# Load 2INE_protein.pdb into the 'protein' object
protein=loadpdb 2INE_protein.pdb

# Check there are no missing parameters for the protein
check protein

# Load 2INE_ligand.pdb into the 'ligand' object
ligand=loadpdb 2INE_ligand.pdb

# Check there are no missing parameters for the ligand
check ligand

# Save a .pdb file for the protein
savepdb protein 2INE_protein_tleap.pdb

# Save a .pdb file for the ligand
savepdb ligand 2INE_ligand_tleap.pdb

# Save initial AMBER topology and coordinate files for the protein
saveamberparm protein 2INE_protein.prmtop_presym 2INE_protein_initial.inpcrd

# Save initial AMBER topology and coordinate files for the ligand
saveamberparm ligand 2INE_ligand.prmtop 2INE_ligand_initial.inpcrd

# Quit tleap
quit

Note that the ligand atom numbering has been fixed in 2INE_ligand_tleap.pdb.

Symmetrising the protein topology file

As for the complex, we need to symmetrise the topology file we just created for the protein to ensure that permutational isomers have the same energy. To do this, we need to edit the 2INE_protein.prmtop_presym file as we did above. The edits required are identical to those for the complex which only has the additional PAC entry in the list of residues. The Python script should be run again as follows:

~/svn/SCRIPTS/AMBER/symmetrise_prmtop/perm-prmtop.ff03.py 2INE_protein.prmtop_presym 2INE_protein.prmtop

Make sure you then revert your edits in the final 2INE_protein.prmtop file as above. Happily this is not a problem for the ligand.

Defining the rigid region and local rigid bodies for the protein

It is essential for the FSA procedure that we use exactly the same rigid body reference structure for the protein as we do for the complex. As we based the protein input on the complex structure that we used as minimisedstartingpoint.pdb to generate the rigid body reference coordinates, we can use the same rbodyconfig and file for both. This is possible because:

  • we did not rigidify the ligand that missing in the protein
  • the atom numbering does not change going from the complex to the protein as the ligand is the final 'residue' so removing it does not affect the numbering of the rest of the system

Unfortunately, this is not true of the coordsinirigid and coords.inpcrd files from that directory as we have changed the number of atoms by removing the ligand. We need to alter both to take this into account:

cp coordsinirigid coordsinirigid_protein
cp coords.inpcrd coords.inpcrd_protein

Editing coordsinirigid_protein is simple. The ligand consists of 18 atoms, so we simply delete the final 18 lines of the file. Afterwards, the end of the file should look like this:

coordsinirigid_protein:

...
    29.87472760000000         25.32162100000000         79.62487950000001
    28.10573830000000         25.26778500000000         78.59233290000000
    26.84940740000000         25.80287030000000         78.62350440000000

We need to think a bit more about editing coords.inpcrd_protein'. First, the second line of the file is the number of atoms in the system. This needs to be reduced by 18 to 5113. We also need to remove the coordinates of the PAC ligand atoms as for coordsinirigid_protein, but in this format - each line contains the coordinates for two atoms. By counting backwards from the bottom, we can see that we need to remove 9.5 lines, leaving the bottom of the file looking like this:

coords.inpcrd_protein:

...
  28.5353297  26.5519001  80.5646514  28.8636981  25.7076199  79.5940559
  29.8747276  25.3216210  79.6248795  28.1057383  25.2677850  78.5923329
  26.8494074  25.8028703  78.6235044

We can also use the same atomgroups file for the protein as long as we remove the three groups that we added for the PAC ligand above, creating atomgroups_protein.

Relaxing the initial rigidified protein geometry

We are going to be relaxing the protein as we did the complex above using OPTIM. Although we are using the same input coordinates (and rigid body definitions) as for the minimised complex, because we have removed the ligand - the structure will no longer be a minimum - the binding pocket is likely to close up somewhat.

The odata (OPTIM data) file we will use is the same as that we used to relax the rigidified complex:

odata

RIGIDINIT
DUMPSTRUCTURES
BFGSMIN 1.0D-6
BFGSSTEPS 10000
UPDATES 1000
MAXERISE 1.0D-4
AMBER9 coords.inpcrd inpcrd

This will ensure that the large rigid 'lump' that we defined for the complex will remain exactly the same in the protein - something essential in the FSA method.

For the input files, we use the modified complex input and rigid body reference coordinates, the protein topology file we symmetrised above and the same min.in and rbodyconfig as for the complex.

cp 2INE_protein.prmtop coords.prmtop
cp coords.inpcrd_protein coords.inpcrd
cp coordsinirigid_protein coordsinirigid

After running OPTIM you should hopefully get something like this:

mylbfgs> Energy and RMS force=    -12049.39633        0.9854679625E-06 after   1242 steps, step:  0.12266E-06
 mylbfgs> Final energy is   -12049.396332313645
 rotcon> Rotational constants (in cm-1) and principal moments of inertia:
        0.00001         0.00001         0.00001   7093417.75475   8704649.93680  10185179.88004
 symmetry> The full molecular point group is C1  .
 symmetry> The largest Abelian subgroup of the full molecular point group is C1  .
 symmetry> Distance tolerance=     0.00010 Inertia tolerance=     0.00010
 symmetry> Order of full point group=     1

 geopt>                          **** CONVERGED ****


 Elapsed time=                              2186.26
 OPTIM> # of energy calls=                          0 time=           0.00 %=  0.0
 OPTIM> # of energy+gradient calls=              1247 time=        2136.04 %= 97.7
 OPTIM> # of energy+gradient+Hessian calls=         0 time=           0.00 %=  0.0

As for the complex, the points.final.rst file is what we will use as the starting point for the protein for all work that follows.

Creating GROUPROTATION input for the ligand

Error creating thumbnail: Unable to save thumbnail to destination
Atom numbering for the PAC ligand in 2INE_ligand_tleap.pdb

Because the atom numbering has changed for the ligand, we cannot simply use the PAC groups from the complex atomgroups file for the ligand by itself. It is quite simple to recreate these groups by visualising 2INE_ligand_tleap.pdb which contains the correct numbering. The result is as follows:

atomgroups_ligand:

GROUP PACRING 5 6 10 1.0 0.025
9
10
11
12
13
14
15
16
17
18
GROUP PACCHAIN1 8 5 6 1.0 0.025
6
7
4
3
2
1
GROUP PACCHAIN2 5 3 3 1.0 0.025
4
2
1

Relaxing the initial ligand geometry

Finally - we are going to be relaxing the ligand. This is almost not worth doing, but for completeness, it would be nice to start from a minimum for the ligand also. Because it is so small, this will hardly take any time at all. We also do not require the RIGIDINIT keyword this time as we always treat the ligand fully atomistically.

odata

DUMPSTRUCTURES
BFGSMIN 1.0D-6
BFGSSTEPS 10000
UPDATES 1000
MAXERISE 1.0D-4
AMBER9 coords.inpcrd inpcrd

We use the ligand topology and input coordinates files generated from tleap above and the same min.in as for the complex and protein.

cp 2INE_ligand.prmtop coords.prmtop
cp 2INE_ligand_initial.inpcrd coords.inpcrd

After running OPTIM you should hopefully get something like this:

 mylbfgs> Energy and RMS force=    -43.52358741        0.8462436792E-06 after    125 steps, step:  0.54651E-06
 mylbfgs> Final energy is   -43.523587405453071
 rotcon> Rotational constants (in cm-1) and principal moments of inertia:
        0.08469         0.08907         0.41399       145.41239       675.89654       710.84086
 symmetry> The full molecular point group is C1  .
 symmetry> The largest Abelian subgroup of the full molecular point group is C1  .
 symmetry> Distance tolerance=     0.00010 Inertia tolerance=     0.00010
 symmetry> Order of full point group=     1

 geopt>                          **** CONVERGED ****


 Elapsed time=                                 0.02
 OPTIM> # of energy calls=                          0 time=           0.00 %=  0.0
 OPTIM> # of energy+gradient calls=               130 time=           0.01 %= 50.0
 OPTIM> # of energy+gradient+Hessian calls=         0 time=           0.00 %=  0.0

As for the complex and protein, the points.final.rst file is what we will use as the starting point for the ligand for all work that follows.

Fantastic! We are now ready to start sampling minima to use in the FSA analysis. For reference, you can download all the files discussed above in the first of the FSA tarballs here.

Sampling relevant minima using GMIN

Error creating thumbnail: Unable to save thumbnail to destination
Table showing the minimisation speed up using CUDAGMIN vs A(9/12)GMIN

Thanks to the work Rosie has put in, we can now leverage the GPUs we have, enabling us to efficiently sample large systems like aldose reductase. You can see a table comparing the minimisation speeds for a range of system sizes for CPU and GPU GMIN on the right. For really small systems, using CUDAGMIN is actually slower than a CPU version, simply because of the overhead time moving data to and from the GPUs. As a result, we're going to use both A12GMIN and CUDAGMIN for this work. You can see how to compile both on the group cmake page here.

We will be using GMIN to sample low lying minima for the complex, protein and ligand. Although you might want to consider using some enhanced sampling techniques such as basin-hopping parallel tempering, BHPT - or the HBONDMATRIX binding mode sampling method for production FSA (as we did for the paper) - for the purposes of this tutorial, we will be running shortish simple basin-hopping runs for each. You can find more information on each in the GMIN documentation here. You should also consider running longer GMIN runs and keep adding minima until the FSA binding free energy converges for your system.

Choosing appropriate non-bonded cutoff and convergence criteria

As for any sampling, before we begin we need to determine three important parameters to balance getting the maximum 'bang for our buck' in terms of exploration/unit time - with accuracy or interactions and ensuring we are actually finding true minima. This is largely controlled by the AMBER parameter cut and the two GMIN keywords TIGHTCONV and SLOPPYCONV:

  • cut: the nun-bonded interaction cutoff distance in AMBER
  • TIGHTCONV/BFGSMIN/BFGSCONV: the tight RMS force condition where minima are well defined (6 'zero' eigenvalues)
  • SLOPPYCONV: the sloppy RMS force condition where minima are not significantly re-ordered when minimised further

These need to be fixed in the order above as changing cut affects the underlying potential and we need to ensure we are actually finding minima before we set how sloppily we will sample during the run. Because we are going to be using CUDAGMIN, we cannot use a non-bonded interaction cutoff as it is not implemented in the AMBER CUDA code, PMEMD - hence we set cut = 999.99 in min.in. These parameters must match for the complex, protein and ligand.

How to choose SLOPPYCONV and TIGHTCONV to achieve the above is discussed informally on the wiki here for the influenza Neuraminidase glycoprotein monomer. For this tutorial, we will be using SLOPPYCONV = 5.0D-3 and TIGHTCONV = 1.0D-3 as in the FSA paper here. This might not be suitable for your system so please check before you spend a lot of time running GMIN!

The GMIN data file for the complex and protein

IN PROGRESS --Csw34 (talk) 17:04, 31 July 2014 (BST)

Can't use the GPUs just yet, waiting on Rosie to finish debugging it, should be done soon!

  • use CUDAGMIN for the complex and protein
  • use A12GMIN for the ligand as CUDAGMIN would be slower due to overhead - see table

I will use writing this section as an opportunity to test HBONDMATRIX with CUDAGMIN. I could also code up the use of the new output from the hbond_matrix.sh script that I put in for Cheng:

  • hbond_all.mat: the matrix for all hydrogen-bonds
  • hbond_bb.mat: the matrix for backbone hydrogen-bonds involving O and H atoms (protein backbone)
  • hbond_sc.mat: the matrix for sidechain hydrogen-bonds involving everything except O and H atoms

We used BHPT in the original FSA paper, but CUDAGMIN is over 100x faster than A9GMIN which was used in the original work, so I don't think we need it :P

Calculating vibrational frequencies using OPTIM

Calculating the binding free energy