Ligand binding-mode searches with HBONDMATRIX
Introduction
HBONDMATRIX is a new (AMB)GMIN keyword which changes the way GMIN function to improve the sampling of ligand binding-modes in large protein systems. This is often necessary as for large systems, there is so much 'noise' in the total energy, that it is not a sufficient order parameter to effectively distinguish ligand binding modes.
In this tutorial, we will discuss using this keyword for the influenza NA protein, complexed with oseltamivir (Tamiflu), an inhibitor. You can find a compressed directory containing the input (and expected output) discussed in this tutorial here. WARNING: the hbond_matrix.sh script in this archive is OUT OF DATE! Please always use the latest version in the svn repository! This is input for only 5 basin-hopping steps which is far too few. As it takes about 48 hours to do 100 steps for this system, 5 is a more realistic number to test!
What you need
In order to use HBONDMATRIX you must:
- be using AMBER (AMBGMIN)
- have the hbond_matrix.sh script in your working directory (you can find a copy in svn/SCRIPTS/AMBER/hbond_matrix)
- have ptraj in your $PATH (try typing 'ptraj' and see if it runs!)
- have a file containing a list of the residues you are interested in in your working directory (hbond_residues.dat in this example)
For influenza NA, this is a simple matter as it is well known which residues in the binding pocket are catalytic, and which support the structure of the pocket, the framework residues:
Standard number | Type | 2HU4 number | 2HU4 type |
118 | Catalytic | 36 | ARG |
119 | Framework | 37 | GLU |
151 | Catalytic | 69 | ASP |
152 | Catalytic | 70 | ARG |
156 | Framework | 74 | ARG |
178 | Framework | 97 | TRP |
179 | Framework | 98 | SER |
198 | Framework | 117 | ASP |
222 | Framework | 141 | ILE |
224 | Catalytic | 143 | ARG |
227 | Framework | 146 | GLU |
274 | Framework | 193 | HIS |
276 | Catalytic | 195 | GLU |
277 | Framework | 196 | GLU |
292 | Catalytic | 211 | ARG |
294 | Framework | 213 | ASN |
371 | Catalytic | 286 | ARG |
406 | Catalytic | 320 | TYR |
425 | Framework | 343 | GLU |
It therefore makes sense to use these residues + the ligand itself (residue 386) as our residues of interest when looking at binding modes.
hbond_residues.dat
36 37 69 70 74 97 98 117 141 143 146 193 195 196 211 213 286 320 343 386
If your system does not have such an obvious set of residues to use, I suggest you choose them radially from the binding pocket, and be quite generous. You want around two shells of residues on all sides of the pocket.
- have a file containing the ptraj format hydrogen-bond donor/acceptor atom masks for your system (donors_acceptors.dat in this example)
This file should be in the AMBER 'mask' syntax used by the program ptraj as described in the AMBER Tools manual here:
donors_acceptors.dat
#-- DONORS from standard amino acid sidechains donor mask :GLN@OE1 donor mask :GLN@NE2 donor mask :ASN@OD1 donor mask :ASN@ND2 donor mask :TYR@OH donor mask :ASP@OD1 donor mask :ASP@OD2 donor mask :GLU@OE1 donor mask :GLU@OE2 donor mask :SER@OG donor mask :THR@OG1 donor mask :HIS@ND1 donor mask :HIE@ND1 donor mask :HID@NE2 #-- ACCEPTORS from standard amino acid sidechains acceptor mask :ASN@ND2 :ASN@HD21 acceptor mask :ASN@ND2 :ASN@HD22 acceptor mask :TYR@OH :TYR@HH acceptor mask :GLN@NE2 :GLN@HE21 acceptor mask :GLN@NE2 :GLN@HE22 acceptor mask :TRP@NE1 :TRP@HE1 acceptor mask :LYS@NZ :LYS@HZ1 acceptor mask :LYS@NZ :LYS@HZ2 acceptor mask :LYS@NZ :LYS@HZ3 acceptor mask :SER@OG :SER@HG acceptor mask :THR@OG1 :THR@HG1 acceptor mask :ARG@NH2 :ARG@HH21 acceptor mask :ARG@NH2 :ARG@HH22 acceptor mask :ARG@NH1 :ARG@HH11 acceptor mask :ARG@NH1 :ARG@HH12 acceptor mask :ARG@NE :ARG@HE acceptor mask :HIS@NE2 :HIS@HE2 acceptor mask :HIE@NE2 :HIE@HE2 acceptor mask :HID@ND1 :HID@HD1 acceptor mask :HIP@ND1,NE2 :HIP@HE2,HD1 #-- Backbone donors and acceptors # N-H for prolines do not exist so need to exclude :PRO@N from the acceptor mask donor mask '!:OTV & @O' acceptor mask '!(:PRO | :OTV) & @N' '!:OTV & @H' #Terminal residues have different atom names - OXT is only present in termini so can use always donor mask @OXT #This assumes there is only one protein segment i.e. if there was a terminal BEFORE the ligand #you would need to add more residue numbers e.g. 'acceptor mask :1,20@N :1,20@H1' etc acceptor mask :1@N :1@H1 acceptor mask :1@N :1@H2 acceptor mask :1@N :1@H3 ### LIGAND (OTV) ### #-- DONORS for OTV donor mask :OTV@O1A donor mask :OTV@O1B donor mask :OTV@N4 donor mask :OTV@O7 donor mask :OTV@O10 #-- ACCEPTORS for OTV acceptor mask :OTV@N4 :OTV@HAA acceptor mask :OTV@N4 :OTV@HAB acceptor mask :OTV@N5 :OTV@HAD
These files can be named whatever you like as they are specified as the second and first arguments of the HBONDMATRIX keyword respectively:
HBONDMATRIX donors_acceptors.dat hbond_residues.dat
The idea (how it works)
HBONDMATRIX introduces an order parameter to help GMIN distinguish ligand binding-modes. Each time a minimum is found, a hydrogen-bond matrix is constructed. This contains the number of hydrogen bonds detected between all pairs of residues specified as being interesting in hbond_residues.dat, and it can be thought of as providing a fingerprint for the binding mode. Here is an example of such a matrix for the NA system:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0
This is generated using the AMBER program ptraj, with input specified on the HBONDMATRIX keyword line in the GMIN data file:
HBONDMATRIX donors_acceptors.dat hbond_residues.dat
- When you first start AMBGMIN, it minimises your input structure to generate an initial quench (Qu 0). A hydrogen-bond matrix is generated, and used to create group 1 - the first binding-mode group. From GMIN_out:
Qu 0 E= -13293.6513505884 steps= 6 RMS= 0.43492E-02 Markov E= -13293.65135 t= 25.2 mc> Storing cis/trans information for initial structure Starting MC run of 100 steps Temperature will be multiplied by 1.00000000 at every step HBONDMATRIX> Group 1 created using initial quench structure
- A step is then taken (in this case, GROUPROTATION of the ligand), and the result minimised as normal.
- A hydrogen-bond matrix is generated for the new structure, and is compared to all existing groups.
- If it DOES match an existing group, the that quench is flagged as belonging the the matching group, and the structure is accepted using the standard metropolis test - but with the Markov energy set to that of the last accepted structure in the matching group. From GMIN_out:
Qu 1 E= -13293.6467051417 steps= 444 RMS= 0.48702E-02 Markov E= -13293.65135 t= 1176.2 HBONDMATRIX> Structure matches group 1 HBONDMATRIX> Group 1 now has population 2 HBONDMATRIX> Setting Markov energy to previous group 1 value of -13293.65135 HBONDMATRIX> New highest energy structure for group 1 found. E= -13293.64671
- If it DOES NOT match an existing group, it is used to form a new group and the step is automatically accepted into the Markov chain, regardless of the total energy. This is intended to promote exploration. From GMIN_out:
Qu 3 E= -13283.3809296421 steps= 794 RMS= 0.45825E-02 Markov E= -13293.64671 t= 1097.2 HBONDMATRIX> New group found, creating group 2 HBONDMATRIX> New group so step auto-accepted
Output files
Two information files are dumped by AMBGMIN when using HBONDMATRIX'. hbondgroupinfo contains information on each group found, while quenchtogroupref provides a list of which group each quenched structure was assigned to, and its energy.
hbondgroupinfo:
ID Size E(min) E(max) E(average) 1 3 -13293.65135 -13293.64671 -13293.64825 2 2 -13283.38290 -13283.38093 -13283.38191 3 1 -13283.97944 -13283.97944 -13283.97944
quenchtogroupref:
Quench Group ID Energy 0 1 -13293.65135 1 1 -13293.64671 2 1 -13293.64671 3 2 -13283.38093 4 2 -13283.38290 5 3 -13283.97944
AMBGMIN will also dump the lowest energy structure found for each group in .pdb and AMBER .rst format named min.groupX.pdb and min.groupX.rst. Note that these are only converged to SLOPPYCONV and have not gone through the final quenching process.
If you also have the DUMPQU keyword in your data file (as we do here), you will also see a set of allgroupX.pdb files. These contain ALL the structures in each group, cat'd together so that they can be easily opened in vmd or Pymol.
Optimising individual binding-modes
The HBONDMATRIX keyword is geared towards exploring as large a variety of binding modes as possible, but this does not mean that you will necessarily hit on the optimal minimum for each mode. It is likely that changes in the nearby protein sidechains would lower the energy for example.
You can use the HBONDMATRIX keyword to focus on a single binding mode by adding REJECT to the end of the line in the GMIN data file. This changes the way HBONDMATRIX operates, by automatically rejecting all steps which end up in a new group from that defined by the initial quenched structure.
I should also mention that you'd probably want to use a different set of moves to optimise a single binding mode, as if you used these GROUPROTATIONS for the ligand, almost every step would end up in a new mode, and so be rejected.
I have not yet tested this, but it should work just fine :)
WARNING: you will still get the group files being produced - including the min.groupX.* files - but these should be ignored. All you care about if optimising a single binding mode is the lowest.*.1.pdb and min*.1.rst files produced.
--Csw34 12:33, 1 May 2012 (UTC)