Calculating binding free energy using the FSA method
The FSA approach is a method developed in the group to allow us to calculate the binding free energy for large protein-ligand systems efficiently. We achieve this by locally rigidifying a large part of the protein outside of the binding site. For aldose reductase, the binding free energy converged when this region included everything more than 14A from the binding site - corresponding to 80% of the protein!
If you want to dig into the details, you can find the FSA methods paper in PCCP here.
Calculating a binding free-energy using FSA has a few distinct stages. This tutorial aims to take you through each of the for the same system we used for the FSA paper, aldose reductase with the AMBER ff99SB force field in vacuum, both on the old CPU clusters and using the new GPU workstations. You can find the files used in this tutorial in the tarballs here:
- Setting up the system
- Sampling relevant minima using GMIN (coming soon!)
- Calculating vibrational frequencies using OPTIM (coming soon!)
- Calculating the binding free energy (coming soon!)
Setting up the system
In order to run an FSA calculation, we need input for:
- the ligand
- the protein (no ligand present, also known as the apo state)
- the complex (ligand bound to protein, also known as the holo state)
This means separate AMBER topology files for each. Because we are rigidifying a large part of the protein, we need to use the same protein conformation to create the complex and protein topology files. The simplest way to do this is to first create parameters for the complex, and then extract the protein and ligand. For aldose reductase, we start with the PDB structure, 2INE.
Preparing the PDB file
This is one of the most time consuming parts of any simulation - setting up the initial input. It is also the most important - a mistake here will affect all of your results down the line. PDB files you use as the basis for your simulation may contain multiple conformation for some residues where one could not be resolved (look for residues called XXXA and XXXB with the same residue ID number), water molecules you don't want to include, ions - and much more.
Removing unwanted information
Because we're going to be running this PDB file through some other code to generate input, we only want it to include information required to produce the output we want - anything else simply increases the chances of something breaking! For 2INE, we first remove everything above the line for atom 1:
ATOM 1 N SER A 2 15.234 43.677 83.072 1.00 32.54 N
The header of a PDB file does contain some useful information about how the crystal structure was generated, but nothing we need to build AMBER input. The PDB also contains the oxygen position for 313 water molecules (residue name HOH) at the end of the file. As we are going to be using an implicit solvent model (or no solvent at all) - we need to remove these lines - here is an example:
HETATM 2572 O HOH A 401 14.330 29.955 92.542 1.00 11.39 O
Finally, we don't need any of the CONNECT information at the bottom of the PDB file, or the final MASTER record. These can be safely deleted.
Adding in termini and replacing HETATOM with ATOM
For clarity, we need to insert some TER lines to specify where the protein, NADP+ (currently called NAP) cofactor and PAC ligand start and end. This is especially important for capping amino acid sequences. There is already a TER line between the protein and cofactor, but we can safely remove everything else on that like so:
ATOM 2512 OXT PHE A 315 20.417 43.672 55.069 1.00 14.28 O TER HETATM 2514 PA NAP A 316 23.443 28.173 72.387 1.00 4.73 P
We need to add TER between the cofactor and ligand, and after the ligand:
HETATM 2561 O3X NAP A 316 23.931 22.139 78.696 1.00 10.71 O TER HETATM 2562 C1 PAC A 317 16.556 26.662 63.481 1.00 15.43 C
and
HETATM 2571 O2 PAC A 317 16.340 27.006 62.296 1.00 17.08 O TER END
Finally, we change the HETATM lines to ATOM - we don't need to treat these atoms any differently when creating input, and it can confuse some codes. You can do this how you like, I suggest the following find/replace vi command:
:%s/HETATM/ATOM /g
Checking for salt bridging CYS residues
Some proteins have strong interchain salt bridges, S-S bonds between CYS residues. If they are present, we need to include a command to define them as being explicitly bonded when we run tleap. This includes changing their residue names. The easiest way to do this is to just load your PDB in Pymol and show sticks for the CYS residues. If two are very close together in space, with the sulphur atoms pointing straight at each other, you can assume a salt bridge:
hide all show sticks, resn CYS
For each cysteine residue involved in a salt bridge - make a note of its residue number, and change the residue name from CYS to CYX - donating a cystine with no hydrogen bonded to sulphur. In our case, we don't have to worry about this as there are no salt bridges in aldose reductase, as you can see from the image on the right.
Specifying the protonation state of histidines
Because histidine can have variable protonation at different pHs, you need to explicitly define how you want each histidine to be protonated. You have three choices:
- HIE: singly protonated at the epsilon position (assumed by default for all HIS residues)
- HID: singly protonated at the delta position
- HIP: double protonated (charged)
There are various web servers which will try to estimate this for you based on a structure - but you really need evidence from a neutron scattering experiment which shows hydrogen positions, or information from a constant pH simulation. In the absence of this, I suggest you assume each histidine residue to be singly protonated at the epsilon position. To do this, we need to replace HIS with HIE in the PDB file, for example using this vi command:
:%s/HIS/HIE/g
AMBER's tleap program we use later will actually assume all the HIS residues are HIE even if you don't modify them, but it is good to be precise!
Renaming the NADP+ cofactor
We also need to change the residue name for NADP+ from NAP to NPD to match the parameter file we will use, which is easily achieved using a vi command:
:%s/NAP/NPD/g
This prevents the reduce program adding hydrogens incorrectly.
Adding missing protein hydrogens to the protein using reduce
In most crystal structures, you will find that hydrogen atoms are missing. This is simply because their position cannot be accurately resolved from an x-ray diffraction pattern. We need to add them back in for our simulation, and the simplest way to do this when using AMBER is with the reduce program. Assuming all the above changes to 2INE.pdb have been incorporated in 2INE_beforereduce.pdb (see the tarball linked at the top):
reduce 2INE_beforereduce.pdb > 2INE_reduce.pdb
reduce will add hydrogen atoms at the equilibrium position for each standard residue as AMBER defines them. This means it will not affect the NADP+ cofactor, and will only add some of the hydrogens to the PAC ligand. We will have to add hydrogen to them separately. It will also complain that is doesn't recognise the HIE histidine residues. This isn't a problem, the hydrogens here will be added by tleap later.
Tweaking the cofactor and ligand
Because the NADP+ and PAC molecules are not part of the standard AMBER force field, we need to provide parameters for them. For the PAC ligand we obtained them using GAMESS and RESP as described in the tutorial here. For the purposes of this tutorial, you can use the PAC parameter files (PAC.prepin and PAC.frcmod) in the setup tar file here. For NADP+, we use slightly modified parameters from the AMBER Parameter Database that you can download here and here.
Just like the protein, the NADP+ cofactor and PAC ligand are lacking all or a few hydrogen atoms. We could add them manually using something like Molfacture in VMD - but this is a bit fiddly. Ideally, we'd like to get the AMBER tleap program to add them for us when we generate parameters, based on the parameter files we supply at the time. To get this to work, the order and names of the heavy atoms in the PDB must match how they appear in the parameter file (.prepin, .mol2 or .off).
Some quick notes:
- you may rename atoms in the .pdb file or in the .prepin/.mol2/.off files
- you may reorder atoms in the .pdb file, but NOT in the .prepin/.mol2/.off files
This is because unlike .pdb files - .prepin/.mol2/.off files contain connectivity information which would be affected by atom reordering. Looking at the atom names for the PAC ligand in the 2INE.pdb file (which involve the ' symbol - ugh!) - in this case I'm going to rename and reorder the atoms in the 2INE.pdb file to match the PAC.prepin:
PAC part of 2INE.pdb (at the bottom!) after using reduce:
... ATOM 2562 C1 PAC A 317 16.556 26.662 63.481 1.00 15.43 C ATOM 2563 C2 PAC A 317 17.578 25.580 63.751 1.00 15.05 C ATOM 2564 C1' PAC A 317 17.351 24.336 62.935 1.00 22.49 C ATOM 2565 C2' PAC A 317 18.250 23.971 61.928 1.00 24.42 C ATOM 2566 C3' PAC A 317 17.969 22.888 61.060 1.00 24.01 C ATOM 2567 C4' PAC A 317 16.776 22.166 61.208 1.00 25.30 C ATOM 2568 C5' PAC A 317 15.872 22.525 62.216 1.00 25.20 C ATOM 2569 C6' PAC A 317 16.166 23.606 63.069 1.00 23.47 C ATOM 2570 O1 PAC A 317 15.972 27.171 64.451 1.00 12.80 O ATOM 2571 O2 PAC A 317 16.340 27.006 62.296 1.00 17.08 O ATOM 0 H6' PAC A 317 15.534 23.849 63.763 1.00 23.47 H ATOM 0 H5' PAC A 317 15.045 22.030 62.326 1.00 25.20 H ATOM 0 H4' PAC A 317 16.578 21.423 60.616 1.00 25.30 H ATOM 0 H3' PAC A 317 18.600 22.646 60.365 1.00 24.01 H ATOM 0 H22 PAC A 317 18.465 25.926 63.563 1.00 15.05 H ATOM 0 H21 PAC A 317 17.557 25.351 64.693 1.00 15.05 H ATOM 0 H2' PAC A 317 19.077 24.466 61.823 1.00 24.42 H
Matching part of PAC.prepin:
4 O1 oh M 3 2 1 1.540 111.208 180.000 -0.607100 5 H18 ho E 4 3 2 0.981 131.182 88.137 0.444000 6 C10 c M 4 3 2 1.359 19.382 82.355 0.641100 7 O2 o E 6 4 3 1.224 122.939 175.332 -0.553000 8 C4 c3 M 6 4 3 1.516 111.882 -4.690 -0.078100 9 H11 hc E 8 6 4 1.097 108.121 -60.082 0.087700 10 H12 hc E 8 6 4 1.097 108.193 55.680 0.083700 11 C3 ca M 8 6 4 1.491 113.500 177.737 -0.095300 12 C5 ca M 11 8 6 1.395 119.980 -90.008 -0.117000 13 H13 ha E 12 11 8 1.087 120.744 0.059 0.133000 14 C7 ca M 12 11 8 1.395 119.994 -179.969 -0.132000 15 H15 ha E 14 12 11 1.086 120.038 -179.953 0.134000 16 C9 ca M 14 12 11 1.395 120.009 -0.045 -0.122000 17 H17 ha E 16 14 12 1.086 120.006 -179.890 0.133000 18 C8 ca M 16 14 12 1.395 120.007 0.000 -0.128000 19 H16 ha E 18 16 14 1.086 120.064 -179.981 0.136000 20 C6 ca M 18 16 14 1.395 119.985 0.045 -0.116000 21 H14 ha E 20 18 16 1.087 119.299 -179.859 0.155000
Before we can start renaming and reordering atoms, we need to know which atom in the PAC.prepin file corresponds to which in the 2INE.pdb file. To find out, we can make a .pdb file from the PAC.prepin using an AMBER program called antechamber:
antechamber -fi prepi -i PAC.prepin -fo pdb -o PAC_antechamber.pdb
You don't need to do this if you have an existing PAC .pdb file, say from generating the parameters in the first place. We now compare the two .pdb files, 2INE.pdb and PAC_antechamber.pdb using Pymol, turning the atom labels on - and match the atom names.
Name in PAC.prepin | Name in 2INE.pdb |
O1 | O1 |
H18 | Missing |
C10 | C1 |
O2 | O2 |
C4 | C2 |
H11 | H22 |
H12 | H21 |
C3 | C1' |
C5 | C6' |
H13 | H6' |
C7 | C5' |
H15 | H5' |
C9 | C4' |
H17 | H4' |
C8 | C3' |
H16 | H3' |
C6 | C2' |
H14 | H2' |
This assignment isn't entirely unambiguous because of the symmetry of the small PAC ligand. It shouldn't matter though in this case. Don't worry about the atom numbers being wrong, tleap will fix them for us in a bit.
PAC part of 2INE.pdb after renaming and reordering to match PAC.prepin:
ATOM 2570 O1 PAC A 317 15.972 27.171 64.451 1.00 12.80 O ATOM 2562 C10 PAC A 317 16.556 26.662 63.481 1.00 15.43 C ATOM 2571 O2 PAC A 317 16.340 27.006 62.296 1.00 17.08 O ATOM 2563 C4 PAC A 317 17.578 25.580 63.751 1.00 15.05 C ATOM 0 H11 PAC A 317 18.465 25.926 63.563 1.00 15.05 H ATOM 0 H12 PAC A 317 17.557 25.351 64.693 1.00 15.05 H ATOM 2564 C3 PAC A 317 17.351 24.336 62.935 1.00 22.49 C ATOM 2569 C5 PAC A 317 16.166 23.606 63.069 1.00 23.47 C ATOM 0 H13 PAC A 317 15.534 23.849 63.763 1.00 23.47 H ATOM 2568 C7 PAC A 317 15.872 22.525 62.216 1.00 25.20 C ATOM 0 H15 PAC A 317 15.045 22.030 62.326 1.00 25.20 H ATOM 2567 C9 PAC A 317 16.776 22.166 61.208 1.00 25.30 C ATOM 0 H17 PAC A 317 16.578 21.423 60.616 1.00 25.30 H ATOM 2566 C8 PAC A 317 17.969 22.888 61.060 1.00 24.01 C ATOM 0 H16 PAC A 317 18.600 22.646 60.365 1.00 24.01 H ATOM 2565 C6 PAC A 317 18.250 23.971 61.928 1.00 24.42 C ATOM 0 H14 PAC A 317 19.077 24.466 61.823 1.00 24.42 H TER END
The final missing H18 atom will be added by tleap later. We need to do something similar for the NADP+ cofactor. In this case, we are going to rename the atoms in the nadp+.prep (.prep is the same as .prepin) to match those in 2INE.pdb. We then reorder the atoms in 2INE.pdb to match nadp+.prep. As before, we need to match the atom names in two files using antechamber to generate a .pdb from the NADP+ parameters, and Pymol to visualise and compare atom positions and names.
You can find the results of all these changes in the file 2INE_tweaked.pdb (where I have also removed all the extra output from reduce) and nadp+_new.prep within the tutorial tarball. Great! Now we're ready to make some initial AMBER input using tleap and see if it'll minimise to a sensible energy.
Creating AMBER parameters for the complex
We are now ready to make AMBER parameters for the initial complex geometry. For this tutorial, we are going to be using the AMBER ff99SB force field for the protein, a combination of ff99SB and custom parameters for the NADP+ cofactor and the General Amber Force Field (GAFF) with RESP charges from GAMESS-US for the PAC ligand. We will be using a set of Born radii suitable for the igb=2 implicit solvent model. For more information, see the AMBER manual.
Generating initial input using tleap
In order to combine all of these parameters, we use the AMBER program tleap. Here is the input file to create the complex, commands can be found in the AMBER Tools manual if you are curious or confused!
WARNING: if you are using AMBER tools 14, you need to change 'source leaprc.ff99SB' to 'source oldff/leaprc.ff99SB' in the file below
initial_complex.tleap:
# tleap input to generate initial aldose reductase + PAC ligand parameters # Set appropriate Born radii values for igb=2 implicit solvent set default PBradii mbondi2 # Load the AMBER ff99SB and GAFF forcefields source leaprc.ff99SB source leaprc.gaff # Load the AMBER parameters for the NADP+ cofactor loadamberprep nadp+_new.prep loadamberparams nad.frcmod # Load the AMBER parameters for the PAC ligand loadamberprep PAC.prepin loadamberparams PAC.frcmod # Load 2INE_tweaked.pdb into the 'complex' object # Hydrogens are added at this stage complex=loadpdb 2INE_tweaked.pdb # Check there are no missing parameters check complex # Save a .pdb file for the complex savepdb complex 2INE_tleap.pdb # Save initial AMBER topology and coordinate files saveamberparm complex 2INE.prmtop_presym 2INE_initial.inpcrd # Quit tleap quit
To run tleap, we use the following command:
tleap -f initial_complex.tleap
The output files from tleap are:
- 2INE.prmtop_presym: the AMBER topology file, containing all the force field parameters for the complex. The improper dihedrals have not yet been symmetrised - see below.
- 2INE_initial.inpcrd: the AMBER coordinate file for the initial complex geometry before it has been relaxed.
- 2INE_tleap.pdb: the .pdb file for the same coordinates. Worth a look in Pymol to make sure it is sensible.
- leap.log: The tleap log file. Check to make sure no serious errors occurred. Investigate if something went wrong.
Symmetrising the topology file
The AMBER and CHARMM force fields are not symmetrised with respect to permutational isomerisation. This is particularly troublesome for our methods where we rely on the energy to discriminate between minima. Without proper symmetrisation, permutational isomers can have different energies, resulting in 'twinning' of minima. You can read our paper about this here. We can symmetrise the topology file made in the last step using a Python script from the SVN repository. Before we do that, we need to identify the terminal residues in the topology file by appending a 'N' or 'C' to the residue name for N and C terminal residues respectively.
Open the topology file and look for the residue label block:
%FLAG RESIDUE_LABEL %FORMAT(20a4) SER ARG LEU LEU LEU ASN ASN GLY ALA LYS MET PRO ILE LEU GLY LEU GLY THR TRP LYS SER PRO PRO GLY GLN VAL THR GLU ALA VAL LYS VAL ALA ILE ASP VAL GLY TYR ARG HIE ILE ASP CYS ALA HIE VAL TYR GLN ASN GLU ASN GLU VAL GLY VAL ALA ILE GLN GLU LYS LEU ARG GLU GLN VAL VAL LYS ARG GLU GLU LEU PHE ILE VAL SER LYS LEU TRP CYS THR TYR HIE GLU LYS GLY LEU VAL LYS GLY ALA CYS GLN LYS THR LEU SER ASP LEU LYS LEU ASP TYR LEU ASP LEU TYR LEU ILE HIE TRP PRO THR GLY PHE LYS PRO GLY LYS GLU PHE PHE PRO LEU ASP GLU SER GLY ASN VAL VAL PRO SER ASP THR ASN ILE LEU ASP THR TRP ALA ALA MET GLU GLU LEU VAL ASP GLU GLY LEU VAL LYS ALA ILE GLY ILE SER ASN PHE ASN HIE LEU GLN VAL GLU MET ILE LEU ASN LYS PRO GLY LEU LYS TYR LYS PRO ALA VAL ASN GLN ILE GLU CYS HIE PRO TYR LEU THR GLN GLU LYS LEU ILE GLN TYR CYS GLN SER LYS GLY ILE VAL VAL THR ALA TYR SER PRO LEU GLY SER PRO ASP ARG PRO TRP ALA LYS PRO GLU ASP PRO SER LEU LEU GLU ASP PRO ARG ILE LYS ALA ILE ALA ALA LYS HIE ASN LYS THR THR ALA GLN VAL LEU ILE ARG PHE PRO MET GLN ARG ASN LEU VAL VAL ILE PRO LYS SER VAL THR PRO GLU ARG ILE ALA GLU ASN PHE LYS VAL PHE ASP PHE GLU LEU SER SER GLN ASP MET THR THR LEU LEU SER TYR ASN ARG ASN TRP ARG VAL CYS ALA LEU LEU SER CYS THR SER HIE LYS ASP TYR PRO PHE HIE GLU GLU PHE NPD PAC %FLAG RESIDUE_POINTER
Here you can see the residues in the complex. The protein first, then the NADP+ cofactor (NPD) and the PAC ligand. Because aldose reductase is a single protein segment (one continuous chain), there is a single N and C terminus. To communicate this to the Python script were about to use, we need to change the residue names to NXXX and CYYY like this:
%FLAG RESIDUE_LABEL %FORMAT(20a4) NSERARG LEU LEU LEU ASN ASN GLY ALA LYS MET PRO ILE LEU GLY LEU GLY THR TRP LYS SER PRO PRO GLY GLN VAL THR GLU ALA VAL LYS VAL ALA ILE ASP VAL GLY TYR ARG HIE ILE ASP CYS ALA HIE VAL TYR GLN ASN GLU ASN GLU VAL GLY VAL ALA ILE GLN GLU LYS LEU ARG GLU GLN VAL VAL LYS ARG GLU GLU LEU PHE ILE VAL SER LYS LEU TRP CYS THR TYR HIE GLU LYS GLY LEU VAL LYS GLY ALA CYS GLN LYS THR LEU SER ASP LEU LYS LEU ASP TYR LEU ASP LEU TYR LEU ILE HIE TRP PRO THR GLY PHE LYS PRO GLY LYS GLU PHE PHE PRO LEU ASP GLU SER GLY ASN VAL VAL PRO SER ASP THR ASN ILE LEU ASP THR TRP ALA ALA MET GLU GLU LEU VAL ASP GLU GLY LEU VAL LYS ALA ILE GLY ILE SER ASN PHE ASN HIE LEU GLN VAL GLU MET ILE LEU ASN LYS PRO GLY LEU LYS TYR LYS PRO ALA VAL ASN GLN ILE GLU CYS HIE PRO TYR LEU THR GLN GLU LYS LEU ILE GLN TYR CYS GLN SER LYS GLY ILE VAL VAL THR ALA TYR SER PRO LEU GLY SER PRO ASP ARG PRO TRP ALA LYS PRO GLU ASP PRO SER LEU LEU GLU ASP PRO ARG ILE LYS ALA ILE ALA ALA LYS HIE ASN LYS THR THR ALA GLN VAL LEU ILE ARG PHE PRO MET GLN ARG ASN LEU VAL VAL ILE PRO LYS SER VAL THR PRO GLU ARG ILE ALA GLU ASN PHE LYS VAL PHE ASP PHE GLU LEU SER SER GLN ASP MET THR THR LEU LEU SER TYR ASN ARG ASN TRP ARG VAL CYS ALA LEU LEU SER CYS THR SER HIE LYS ASP TYR PRO PHE HIE GLU GLU CPHENPD PAC %FLAG RESIDUE_POINTER
Note that we are not retaining spaces as the topology file is read as a fixed format - each residue name is up to 4 characters. Assuming you have a copy of the SVN repository in your home directory, we can run the Python script as follows:
~/svn/SCRIPTS/AMBER/symmetrise_prmtop/perm-prmtop.ff03.py 2INE.prmtop_presym 2INE.prmtop
Note that although the script implies we should use it for the ff03 force field, the atom naming and ordering is identical between ff03 and ff99SB - so it works for both. This produces the symmetrised topology file 2INE.prmtop. Finally, we undo the changes we made to the residue names in this new topology file to prevent them showing up in our output.
Before we are ready to start sampling minima for the FSA procedure itself, we will need to relax the system to make sure we have removed steric clashes introduced by our addition of hydrogen to the system. We can do this by minimising the complex with OPTIM.
Relaxing the initial complex geometry
We are going to be minimising the complex using OPTIM following the tutorial here. If you have not compiled OPTIM with AMBER9 before, you can find instructions on the group CMake page here.
In brief, the odata (OPTIM data) file we will use is as follows:
odata
DUMPSTRUCTURES BFGSMIN 1.0D-6 BFGSSTEPS 10000 UPDATES 1000 MAXERISE 1.0D-4 AMBER9 coords.inpcrd inpcrd
Here is a quick summary of each keyword, more can be found in the OPTIM documentation:
- DUMPSTRUCTURES: produce AMBER restart (.rst), PDB (.pdb) and XYZ (.xyz) files for the final, minimised geometry.
- BFGSMIN 1.0D-6: continue to minimise until the RMS force is below 1.0D-6.
- BFGSSTEPS 10000: do a maximum of 10000 minimisation steps.
- UPDATES 1000: retain the previous 1000 configuration in memory for use in determining each subsequent minimisation step.
- MAXERISE 1.0D-4: allow the energy to rise by up to 1.0D-4 kcal/mol in a single minimisation step, otherwise reduce the step size.
- AMBER9 coords.inpcrd inpcrd: Use the AMBER9 potential, starting from the input file coords.inpcrd in inpcrd format.
A9OPTIM expects the input files coords.prmtop, the topology file containing all the force field parameters required to calculate the energy for any given conformation of atoms and coords.inpcrd, the initial coordinate to start the minimisation from. We have already produced both these files above, but have named them 2INE.prmtop and 2INE_initial.inpcrd respectively. We can simply copy these files to ensure OPTIM is able to use them:
cp 2INE.prmtop coords.prmtop cp 2INE_initial.inpcrd coords.inpcrd
The final OPTIM input file we need is min.in which specifies some AMBER potential options, such as which implicit solvent model we want to use. Here is the one we shall be using:
min.in
Minimization &cntrl imin=1, maxcyc=1, ncyc=1, igb=2, saltcon=0.2, ntx=1, ntb=0, cut=999.99, rgbmax=12.0, ifswitch=1 /
Full details of these keywords (apart from ifswitch=1 - see below) can be found in the AMBER manual. The key ones to understand are:
- igb=2: specifies the use of a modified Generalised Born solvent model which attempts to account for interstitial spaces.
- ifswitch=1: a keyword we inserted into AMBER to allow a smooth switching of the implicit solvent terms at the non-bonded cutoff.
- cut=999.9: specified the cutoff for non-bonded interactions should be 999.99A. This in effect means no cutoff at all. Why we are using 999.9A is discussed below.
Now we have the necessary files ready, we can run A9OPTIM as follows (assuming you moved it to your /home/CRSID/bin directory):
A9OPTIM > optim.out & tail -f optim.out
We are running A0OPTIM in the background (&), sending the output to a file called optim.out. Using tail -f, we are able to follow this file being created and watch the minimisation proceed. If things don't work, make sure you show someone this file. The minimisation will take some time so don't just sit and watch it - come back later. When the minimisation has finished, you should hopefully see something like this at the bottom of optim.out:
... mylbfgs> Energy and RMS force= -12106.38089 0.9384470210E-06 after 5990 steps, step: 0.21888E-06 mylbfgs> Final energy is -12106.380888338343 rotcon> Rotational constants (in cm-1) and principal moments of inertia: 0.00001 0.00001 0.00001 7107902.94327 8718515.11211 10189130.86838 symmetry> The full molecular point group is C1 . symmetry> The largest Abelian subgroup of the full molecular point group is C1 . symmetry> Distance tolerance= 0.00010 Inertia tolerance= 0.00010 symmetry> Order of full point group= 1 geopt> **** CONVERGED **** Elapsed time= 10594.44 OPTIM> # of energy calls= 0 time= 0.00 %= 0.0 OPTIM> # of energy+gradient calls= 6004 time= 10198.12 %= 96.3 OPTIM> # of energy+gradient+Hessian calls= 0 time= 0.00 %= 0.0
The final geometry is contained in the three output files, points.final.xyz/rst/pdb. It's always a good idea to compare this relaxed structure to the initial geometry (2INE_tleap.pdb) using Pymol and check what has changed.
The points.final.rst file is actually in the same AMBER restart format as the the 2INE_initial.inpcrd file we created with tleap and copied to coords.inpcrd to use in the OPTIM minimisation. This means that we can use points.final.rst in the same way for future GMIN and OPTIM jobs.
Defining the rigid region and local rigid bodies for the complex
In the FSA scheme, we treat a large part of the protein distant from the ligand binding site as a single rigid body. We do this within the generalised rigid body framework implemented in GMIN and OPTIM, specified by the RIGIDINIT keyword. When looking at ligand binding, the critical decision is at what distance from the ligand to start this rigidifying. We want to rigidify as much of the system as possible (fewer degrees of freedom = less time for minimisation/normal mode analysis) without affecting ligand binding, and what is safe will vary from system to system.
If you are using the tutorial for a system other than aldose reductase, I suggest you spend some time considering this. For now - we are going to use the smallest distance where we saw convergence of the binding energy - 14A. In the FSA paper, we also created a 'locally rigid' buffer region from R (14A) to R+1 or (15A). We will follow this same protocol. To specify rigid bodies for both GMIN and OPTIM we need to create two more files:
- rbodyconfig: defines each rigid body by listing its constituent atoms.
- coordsrigidinit: contains coordinates (x, y, z only for each atom, one atom per line) that are used to define the internal geometry of the rigid bodies specified in rbodyconfig - this is our reference structure so it is important that it is physical.
We could create these manually, but happily we have a relatively simple Fortran code to do it for us. You can find it here:
~/svn/SCRIPTS/AMBER/rigidbody/groupRigidBodyINC.f90
Alternatively there is a python scriptgroupRigidBodyINC.f90, that will also do the job, offer us more options and creates two additional output files: rigid.log - a log file to check at any time what was rigidified and parmed_in - an input file for parmed.py to turn off interactions within the rigid body. The script can be found here:
~/svn/SCRIPTS/AMBER/rigidbody/genrigid_input.py
A guide of how to use it is found here.
The fortran code requires two files to generate rigid body input for GMIN and OPTIM:
- minimisedstartingpoint.pdb: a pdb file used to generate the reference coordinates for the rigid bodies
- coords.inpcrd: a matching AMBER inpcrd file for the PDB file above which provides more accurate coordinates
We are going to be using the relaxed structure we got from minimising in OPTIM for the reference geometry, so we can generate these files by simply using the output files as follows:
mkdir setrigidbodies cp points.final.rst rigidbodies/coords.inpcrd cp points.final.pdb rigidbodies/minimisedstartingpoint.pdb cd setrigidbodies
Before we can use this program, we need to make a change to fit our input. Copy the source code into the working directory and open it for editing:
cp ~/svn/SCRIPTS/AMBER/rigidbody/groupRigidBodyINC.f90 . vi groupRigidBodyINC.f90
The program can read three slightly different .pdb formats, that from the ambpdb program, from GMIN or OPTIM or in a strict, fixed format. As we are using OPTIM output, we need to make sure that the corresponding READ line is the only one not commented (not preceded by a !) as follows:
! ACTION REQUIRED: MAKE SURE ONLY A SINGLE READ LINE IN THE BLOCK BELOW IS UNCOMMENTED ! PDB format 1: from ambpdb (make sure you delete the REMARK line from the top before running this) !ATOM 1 N TRP 1 3.613 2.061 -0.003 1.00 0.00 N ! READ(1,*,IOSTAT=iostatus) RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1), RESIDUENO(J1), X(J1), Y(J1), Z(J1), DUMMYDP, DUMMYDP ! PDB format 2: from GMIN or OPTIM (includes a chain ID e.g. A in example below) !ATOM 1 N TRP A 1 3.613 2.061 -0.003 1.00 1.00 N READ(1,*,IOSTAT=iostatus) RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1), DUMMY1, RESIDUENO(J1), X(J1), Y(J1), Z(J1), DUMMYDP, DUMMYDP, DUMMY2 ! PDB format 3: fixed format PDB files (deal with fixed formatting of pdb files - Mark Oakley) ! READ(1,'(A6,I5,X,A4,X,A3,2X,I4,4X,3F8.3)',IOSTAT=iostatus) & ! RECORDNAME, ATOMNO(J1), ATOMNAME(J1), RESIDUENAME(J1), RESIDUENO(J1), X(J1), Y(J1), Z(J1)
Save your changes (:wq) and open the coords.inpcrd file - we also need to make sure that the 'title' (first) line of the coords.inpcrd file is blank (which it will not be if you used OPTIM):
vi coords.inpcrd <ENSURE THE 1ST LINE IS BLANK AND THE 2ND IS THE NUMBER OF ATOMS> :wq
We are now ready to compile and run the rigid body program:
pgf90 -o groupRigidBodyINC groupRigidBodyINC.f90 ./groupRigidBodyINC
We have a lot of choices to make here! For the purposes of this tutorial, we will be following what was done for the FSA paper, but obviously you should do what is appropriate for your system and your purposes. The first decision we need to make is which atom to use as the centre point for our rigid body definitions. In our case, we want to use something in the ligand binding pocket - like the PAC ligand. We will use the C3 atom of the ligand, index 5121 (you might want to check this!). We then need to specify the different regions by a radius from this atom:
- the radius below which we don't want to group anything into rigid bodies = 14A
- the radius above which we want to group everything into a single rigid body = 15A (this gives us a 1A 'buffer' region where we can use 'local' rigid bodies e.g. rigidifying sp2 centres)
We then need to make some decisions about which local rigid bodies to set up. Here is the sequence used for this tutorial:
Should I group PEPTIDE BOND C-O-N-H as a rigid body? (Y/N) Y Should I group PROLINE N-CD-CG-CB-CA as a rigid body? (Y/N) N Should I group ARGININE NE-HE-CZ-NH1-HH11-HH12-NH2-HH21-HH22 as a rigid body? (Y/N) Y Should I group HISTIDINE (HIS, HIE, HID, HIP) CG-ND1-CE1-NE2-CD2 as a rigid body? (Y/N) Y Should I group LYSINE NZ-HZ1-HZ2-HZ3 as a rigid body? (Y/N) Y Should I group ASPARAGINE CB-CG-OD1-OD2 as a rigid body? (Y/N) Y Should I group GLUTAMIC ACID CG-CD-OE1-OE2 as a rigid body? (Y/N) Y Should I group ASPARAGINE CB-CG-OD1-ND2-HD21-HD22 as a rigid body? (Y/N) Y Should I group GLUTAMINE CG-CD-OE1-NE2-HE21-HE22 as a rigid body? (Y/N) Y Should I group PHENYLALANINE CG-CD1-HD1-CE1-HE1-CZ-HZ-CE2-HE2-CD2-HD2 as a rigid body? (Y/N) Y Should I group TYROSINE CG-CD1-HD1-CE1-HE1-CZ-OH-CE2-HE2-CD2-HD2 as a rigid body? (Y/N) Y Should I group TRYPTOPHAN as a 1 (CG-CD1-HD1-NE1-HE1-CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2) or 2 (CG-CD1-HD1-NE1-HE1)-(CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2) rigid bodies? (1/2/N) 2 ANYTHING ELSE TO GROUP? (Y/N) WRITE IN USRRIGID FILE IN THE FOLLOWING FORMAT: TYR 6 CG CD1 CE1 CZ CE2 CD2 N Should I group all atoms outside RADIUSALL a rigid body? (Y/N) Y
There are pros and cons to rigidifying things like this, for example, by fixing the peptide bonds - you are excluding some conformation where this angle is required to slightly change. A good example of this is rigidifying the TRP rings. If you rigidify them as a single rigid body - you can actually remove the global minimum as the rings are slightly bent. Hence we need to rigidify them as two separate rigid bodies. See the rigid body paper for more information on how safe or not these options are.
We are now able to specify some further options which will produce GROUPROTATION input for GMIN to allow us to change the conformation of the sidechains in the binding pocket. For more info on GROUPROTATION, see the GMIN documentation.
What is the uniform probability you want to choose? (0.01-0.1 seems sensible) 0.025 Do you want group rotations of the side chain with N-CA as the axis? (Y/N) FYI I am not doing anything for PRO N Do you want group rotations of the side chain with C-CA as the axis? (Y/N) FYI I am not doing anything for PRO N Do you want group rotations of the side chain with CA-CB as the axis? (Y/N) FYI I am not doing anything for ALA, GLY, PRO Y Do you want to use the default values? (Y/N) Y For which regions do you want me to generate the GROUPROTATION FILE? (1) JUST THE ALL ATOM REGION (2) JUST THE MINIMAL RIGID BODY REGION (3) BOTH REGIONS 3 Do you want group rotations of the side chain with CB-CG as the axis? (Y/N) FYI I am not doing anything for ALA, GLY, PRO, SER, CYS, THR, VAL Y Do you want to use the default values? (Y/N) Y For which regions do you want me to generate the GROUPROTATION FILE? (1) JUST THE ALL ATOM REGION (2) JUST THE MINIMAL RIGID BODY REGION (3) BOTH REGIONS 3 Do you want group rotations of the side chain with CG-CD as the axis? (Y/N) FYI I am not doing anything for all residues except for ARG, LYS, GLU, GLN Y Do you want to use the default values? (Y/N) Y For which regions do you want me to generate the GROUPROTATION FILE? (1) JUST THE ALL ATOM REGION (2) JUST THE MINIMAL RIGID BODY REGION (3) BOTH REGIONS 3 ANYTHING ELSE? (Y/N) WRITE IN AUTOGR FILE IN THE FOLLOWING FORMAT: RESIDUENAME NO_MEMBER AMP PROB AXIS1 AXIS2 MEMBER1 MEMBER2 MEMBER3, ETC N
You should now have three new files in your directory - rbodyconfig and coordsinirigid are described above. The rigid bodies defined here are shown in the image above where grey signifies the one, large rigid body - blue shows local rigid bodies and green is fully atomistic. The third file atomgroups is the GROUPROTATION input file whose format is described in the GMIN documentation. While we're talking about GROUPROTATION - we would like to add a few rotation groups for the ligand, as only groups for the protein sidechains are created by the program automatically. Open the atomgroups file and add the following at the top:
GROUP PACRING 5118 5121 10 1.0 0.025 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 GROUP PACCHAIN1 5121 5118 6 1.0 0.025 5120 5119 5117 5116 5115 5114 GROUP PACCHAIN2 5118 5116 3 1.0 0.025 5117 5115 5114
Great! We're done making input for the complex and can now move on to the protein which is simply the complex without the PAC ligand.
Relaxing the now rigidified complex
Although we don't expect any structural change to occur if we minimise after rigidifying (as the reference structure for the rigid bodies matches the input structure) - it is still worth doing as a sanity check, and to make sure you are definitely starting GMIN sampling from a minimum. Relaxing the rigidified complex is exactly the same as for the fully atomistic system, with a single change to the odata (OPTIM data) file, including the RIGIDINIT keyword:
odata
RIGIDINIT DUMPSTRUCTURES BFGSMIN 1.0D-6 BFGSSTEPS 10000 UPDATES 1000 MAXERISE 1.0D-4 AMBER9 coords.inpcrd inpcrd
RIGIDINIT specifies that we should use rigid bodies as defined in the rbodyconfig and coordsinirigid files that we created above above. You will probably still see a few minimisation steps occuring, but if the energy changes significantly, you have a problem. The end of the OPTIM output should look something like this:
mylbfgs> Energy and RMS force= -12106.38089 0.8960252262E-06 after 114 steps, step: 0.17444E-07 mylbfgs> Final energy is -12106.380888335800 rotcon> Rotational constants (in cm-1) and principal moments of inertia: 0.00001 0.00001 0.00001 7107902.93874 8718515.10946 10189130.86900 symmetry> The full molecular point group is C1 . symmetry> The largest Abelian subgroup of the full molecular point group is C1 . symmetry> Distance tolerance= 0.00010 Inertia tolerance= 0.00010 symmetry> Order of full point group= 1 geopt> **** CONVERGED **** Elapsed time= 198.53 OPTIM> # of energy calls= 0 time= 0.00 %= 0.0 OPTIM> # of energy+gradient calls= 115 time= 197.38 %= 99.4 OPTIM> # of energy+gradient+Hessian calls= 0 time= 0.00 %= 0.0
As before, a points.final.rst file will be produced - this is going to be our starting point for the complex for all work that follows.
Creating protein and ligand parameters
Now we have working input for the complex, we can easily make protein and ligand input by chopping up the output .pdb file from OPTIM and re-running tleap as before. Happily, we don't have to repeat a lot of the initial steps tweaking the .pdb as we did for the complex - all that work carries right over! :)
First, lets copy the OPTIM points.final.pdb file for the complex into two new files which we can edit:
cp points.final.pdb 2INE_protein.pdb cp points.final.pdb 2INE_ligand.pdb
Chopping up the .pdb files
We have a simple edit to make to each .pdb file before we can use them to generate AMBER input. We now want to remove the PAC ligand atoms from 2INE_protein.pdb and everything but the PAC ligand atoms from 2INE_ligand.pdb which will end up looking like this:
ATOM 5114 O1 PAC A 316 17.997 27.868 62.311 1.00 1.00 O ATOM 5115 H18 PAC A 316 17.241 28.485 62.467 1.00 1.00 H ATOM 5116 C10 PAC A 316 17.368 26.821 61.891 1.00 1.00 C ATOM 5117 O2 PAC A 316 16.187 26.901 61.636 1.00 1.00 O ATOM 5118 C4 PAC A 316 18.257 25.611 61.715 1.00 1.00 C ATOM 5119 H11 PAC A 316 18.787 25.731 60.769 1.00 1.00 H ATOM 5120 H12 PAC A 316 19.011 25.624 62.504 1.00 1.00 H ATOM 5121 C3 PAC A 316 17.536 24.267 61.765 1.00 1.00 C ATOM 5122 C5 PAC A 316 16.667 23.962 62.816 1.00 1.00 C ATOM 5123 H13 PAC A 316 16.474 24.694 63.598 1.00 1.00 H ATOM 5124 C7 PAC A 316 16.052 22.712 62.883 1.00 1.00 C ATOM 5125 H15 PAC A 316 15.376 22.482 63.704 1.00 1.00 H ATOM 5126 C9 PAC A 316 16.313 21.753 61.907 1.00 1.00 C ATOM 5127 H17 PAC A 316 15.838 20.774 61.962 1.00 1.00 H ATOM 5128 C8 PAC A 316 17.188 22.046 60.863 1.00 1.00 C ATOM 5129 H16 PAC A 316 17.399 21.295 60.103 1.00 1.00 H ATOM 5130 C6 PAC A 316 17.794 23.300 60.790 1.00 1.00 C ATOM 5131 H14 PAC A 316 18.479 23.515 59.971 1.00 1.00 H
Again, we don't need to worry about the atom numbering being wrong, tleap will fix that for us below. Thebottom of 2INE_protein.pdb should look like this:
... ATOM 5108 H61A NPD A 315 26.154 28.390 81.535 1.00 1.00 H ATOM 5109 N1A NPD A 315 28.535 26.552 80.565 1.00 1.00 N ATOM 5110 C2A NPD A 315 28.864 25.708 79.594 1.00 1.00 C ATOM 5111 H2A NPD A 315 29.875 25.322 79.625 1.00 1.00 H ATOM 5112 N3A NPD A 315 28.106 25.268 78.592 1.00 1.00 N ATOM 5113 C4A NPD A 315 26.849 25.803 78.624 1.00 1.00 C
Generating initial protein and ligand input using tleap
Now that we have these input .pdb files ready - we can create initial AMBER input for both the protein at the same time by using the following tleap input file.
initial_protein_and_ligand.tleap:
# tleap input to generate initial aldose reductase parameters # for the protein and PAC ligand separately # Set appropriate Born radii values for igb=2 implicit solvent set default PBradii mbondi2 # Load the AMBER ff99SB and GAFF forcefields source leaprc.ff99SB source leaprc.gaff # Load the AMBER parameters for the NADP+ cofactor loadamberprep nadp+_new.prep loadamberparams nad.frcmod # Load the AMBER parameters for the PAC ligand loadamberprep PAC.prepin loadamberparams PAC.frcmod # Load 2INE_protein.pdb into the 'protein' object protein=loadpdb 2INE_protein.pdb # Check there are no missing parameters for the protein check protein # Load 2INE_ligand.pdb into the 'ligand' object ligand=loadpdb 2INE_ligand.pdb # Check there are no missing parameters for the ligand check ligand # Save a .pdb file for the protein savepdb protein 2INE_protein_tleap.pdb # Save a .pdb file for the ligand savepdb ligand 2INE_ligand_tleap.pdb # Save initial AMBER topology and coordinate files for the protein saveamberparm protein 2INE_protein.prmtop_presym 2INE_protein_initial.inpcrd # Save initial AMBER topology and coordinate files for the ligand saveamberparm ligand 2INE_ligand.prmtop 2INE_ligand_initial.inpcrd # Quit tleap quit
Note that the ligand atom numbering has been fixed in 2INE_ligand_tleap.pdb.
Symmetrising the protein topology file
As for the complex, we need to symmetrise the topology file we just created for the protein to ensure that permutational isomers have the same energy. To do this, we need to edit the 2INE_protein.prmtop_presym file as we did above. The edits required are identical to those for the complex which only has the additional PAC entry in the list of residues. The Python script should be run again as follows:
~/svn/SCRIPTS/AMBER/symmetrise_prmtop/perm-prmtop.ff03.py 2INE_protein.prmtop_presym 2INE_protein.prmtop
Make sure you then revert your edits in the final 2INE_protein.prmtop file as above. Happily this is not a problem for the ligand.
Defining the rigid region and local rigid bodies for the protein
It is essential for the FSA procedure that we use exactly the same rigid body reference structure for the protein as we do for the complex. As we based the protein input on the complex structure that we used as minimisedstartingpoint.pdb to generate the rigid body reference coordinates, we can use the same rbodyconfig and file for both. This is possible because:
- we did not rigidify the ligand that missing in the protein
- the atom numbering does not change going from the complex to the protein as the ligand is the final 'residue' so removing it does not affect the numbering of the rest of the system
Unfortunately, this is not true of the coordsinirigid and coords.inpcrd files from that directory as we have changed the number of atoms by removing the ligand. We need to alter both to take this into account:
cp coordsinirigid coordsinirigid_protein cp coords.inpcrd coords.inpcrd_protein
Editing coordsinirigid_protein is simple. The ligand consists of 18 atoms, so we simply delete the final 18 lines of the file. Afterwards, the end of the file should look like this:
coordsinirigid_protein:
... 29.87472760000000 25.32162100000000 79.62487950000001 28.10573830000000 25.26778500000000 78.59233290000000 26.84940740000000 25.80287030000000 78.62350440000000
We need to think a bit more about editing coords.inpcrd_protein'. First, the second line of the file is the number of atoms in the system. This needs to be reduced by 18 to 5113. We also need to remove the coordinates of the PAC ligand atoms as for coordsinirigid_protein, but in this format - each line contains the coordinates for two atoms. By counting backwards from the bottom, we can see that we need to remove 9.5 lines, leaving the bottom of the file looking like this:
coords.inpcrd_protein:
... 28.5353297 26.5519001 80.5646514 28.8636981 25.7076199 79.5940559 29.8747276 25.3216210 79.6248795 28.1057383 25.2677850 78.5923329 26.8494074 25.8028703 78.6235044
We can also use the same atomgroups file for the protein as long as we remove the three groups that we added for the PAC ligand above, creating atomgroups_protein.
Relaxing the initial rigidified protein geometry
We are going to be relaxing the protein as we did the complex above using OPTIM. Although we are using the same input coordinates (and rigid body definitions) as for the minimised complex, because we have removed the ligand - the structure will no longer be a minimum - the binding pocket is likely to close up somewhat.
The odata (OPTIM data) file we will use is the same as that we used to relax the rigidified complex:
odata
RIGIDINIT DUMPSTRUCTURES BFGSMIN 1.0D-6 BFGSSTEPS 10000 UPDATES 1000 MAXERISE 1.0D-4 AMBER9 coords.inpcrd inpcrd
This will ensure that the large rigid 'lump' that we defined for the complex will remain exactly the same in the protein - something essential in the FSA method.
For the input files, we use the modified complex input and rigid body reference coordinates, the protein topology file we symmetrised above and the same min.in and rbodyconfig as for the complex.
cp 2INE_protein.prmtop coords.prmtop cp coords.inpcrd_protein coords.inpcrd cp coordsinirigid_protein coordsinirigid
After running OPTIM you should hopefully get something like this:
mylbfgs> Energy and RMS force= -12049.39633 0.9854679625E-06 after 1242 steps, step: 0.12266E-06 mylbfgs> Final energy is -12049.396332313645 rotcon> Rotational constants (in cm-1) and principal moments of inertia: 0.00001 0.00001 0.00001 7093417.75475 8704649.93680 10185179.88004 symmetry> The full molecular point group is C1 . symmetry> The largest Abelian subgroup of the full molecular point group is C1 . symmetry> Distance tolerance= 0.00010 Inertia tolerance= 0.00010 symmetry> Order of full point group= 1 geopt> **** CONVERGED **** Elapsed time= 2186.26 OPTIM> # of energy calls= 0 time= 0.00 %= 0.0 OPTIM> # of energy+gradient calls= 1247 time= 2136.04 %= 97.7 OPTIM> # of energy+gradient+Hessian calls= 0 time= 0.00 %= 0.0
As for the complex, the points.final.rst file is what we will use as the starting point for the protein for all work that follows.
Creating GROUPROTATION input for the ligand
Because the atom numbering has changed for the ligand, we cannot simply use the PAC groups from the complex atomgroups file for the ligand by itself. It is quite simple to recreate these groups by visualising 2INE_ligand_tleap.pdb which contains the correct numbering. The result is as follows:
atomgroups_ligand:
GROUP PACRING 5 6 10 1.0 0.025 9 10 11 12 13 14 15 16 17 18 GROUP PACCHAIN1 8 5 6 1.0 0.025 6 7 4 3 2 1 GROUP PACCHAIN2 5 3 3 1.0 0.025 4 2 1
Relaxing the initial ligand geometry
Finally - we are going to be relaxing the ligand. This is almost not worth doing, but for completeness, it would be nice to start from a minimum for the ligand also. Because it is so small, this will hardly take any time at all. We also do not require the RIGIDINIT keyword this time as we always treat the ligand fully atomistically.
odata
DUMPSTRUCTURES BFGSMIN 1.0D-6 BFGSSTEPS 10000 UPDATES 1000 MAXERISE 1.0D-4 AMBER9 coords.inpcrd inpcrd
We use the ligand topology and input coordinates files generated from tleap above and the same min.in as for the complex and protein.
cp 2INE_ligand.prmtop coords.prmtop cp 2INE_ligand_initial.inpcrd coords.inpcrd
After running OPTIM you should hopefully get something like this:
mylbfgs> Energy and RMS force= -43.52358741 0.8462436792E-06 after 125 steps, step: 0.54651E-06 mylbfgs> Final energy is -43.523587405453071 rotcon> Rotational constants (in cm-1) and principal moments of inertia: 0.08469 0.08907 0.41399 145.41239 675.89654 710.84086 symmetry> The full molecular point group is C1 . symmetry> The largest Abelian subgroup of the full molecular point group is C1 . symmetry> Distance tolerance= 0.00010 Inertia tolerance= 0.00010 symmetry> Order of full point group= 1 geopt> **** CONVERGED **** Elapsed time= 0.02 OPTIM> # of energy calls= 0 time= 0.00 %= 0.0 OPTIM> # of energy+gradient calls= 130 time= 0.01 %= 50.0 OPTIM> # of energy+gradient+Hessian calls= 0 time= 0.00 %= 0.0
As for the complex and protein, the points.final.rst file is what we will use as the starting point for the ligand for all work that follows.
Fantastic! We are now ready to start sampling minima to use in the FSA analysis. For reference, you can download all the files discussed above in the first of the FSA tarballs here.
Sampling relevant minima using GMIN
Thanks to the work Rosie has put in, we can now leverage the GPUs we have, enabling us to efficiently sample large systems like aldose reductase. You can see a table comparing the minimisation speeds for a range of system sizes for CPU and GPU GMIN on the right. For really small systems, using CUDAGMIN is actually slower than a CPU version, simply because of the overhead time moving data to and from the GPUs. As a result, we're going to use both A12GMIN and CUDAGMIN for this work. You can see how to compile both on the group cmake page here.
We will be using GMIN to sample low lying minima for the complex, protein and ligand. Although you might want to consider using some enhanced sampling techniques such as basin-hopping parallel tempering, BHPT - or the HBONDMATRIX binding mode sampling method for production FSA (as we did for the paper) - for the purposes of this tutorial, we will be running shortish simple basin-hopping runs for each. You can find more information on each in the GMIN documentation here. You should also consider running longer GMIN runs and keep adding minima until the FSA binding free energy converges for your system.
Choosing appropriate non-bonded cutoff and convergence criteria
As for any sampling, before we begin we need to determine three important parameters to balance getting the maximum 'bang for our buck' in terms of exploration/unit time - with accuracy or interactions and ensuring we are actually finding true minima. This is largely controlled by the AMBER parameter cut and the two GMIN keywords TIGHTCONV and SLOPPYCONV:
- cut: the nun-bonded interaction cutoff distance in AMBER
- TIGHTCONV/BFGSMIN/BFGSCONV: the tight RMS force condition where minima are well defined (6 'zero' eigenvalues)
- SLOPPYCONV: the sloppy RMS force condition where minima are not significantly re-ordered when minimised further
These need to be fixed in the order above as changing cut affects the underlying potential and we need to ensure we are actually finding minima before we set how sloppily we will sample during the run. Because we are going to be using CUDAGMIN, we cannot use a non-bonded interaction cutoff as it is not implemented in the AMBER CUDA code, PMEMD - hence we set cut = 999.99 in min.in. These parameters must match for the complex, protein and ligand.
How to choose SLOPPYCONV and TIGHTCONV to achieve the above is discussed informally on the wiki here for the influenza Neuraminidase glycoprotein monomer. For this tutorial, we will be using SLOPPYCONV = 5.0D-3 and TIGHTCONV = 1.0D-3 as in the FSA paper here. This might not be suitable for your system so please check before you spend a lot of time running GMIN!
The GMIN data file for the complex and protein
IN PROGRESS --Csw34 (talk) 17:04, 31 July 2014 (BST)
Can't use the GPUs just yet, waiting on Rosie to finish debugging it, should be done soon!
- use CUDAGMIN for the complex and protein
- use A12GMIN for the ligand as CUDAGMIN would be slower due to overhead - see table
I will use writing this section as an opportunity to test HBONDMATRIX with CUDAGMIN. I could also code up the use of the new output from the hbond_matrix.sh script that I put in for Cheng:
- hbond_all.mat: the matrix for all hydrogen-bonds
- hbond_bb.mat: the matrix for backbone hydrogen-bonds involving O and H atoms (protein backbone)
- hbond_sc.mat: the matrix for sidechain hydrogen-bonds involving everything except O and H atoms
We used BHPT in the original FSA paper, but CUDAGMIN is over 100x faster than A9GMIN which was used in the original work, so I don't think we need it :P