Ligand binding-mode searches with HBONDMATRIX

Introduction

HBONDMATRIX is a new (AMB)GMIN keyword which changes the way GMIN function to improve the sampling of ligand binding-modes in large protein systems. This is often necessary as for large systems, there is so much 'noise' in the total energy, that it is not a sufficient order parameter to effectively distinguish ligand binding modes.

In this tutorial, we will discuss using this keyword for the influenza NA protein, complexed with oseltamivir (Tamiflu), an inhibitor. You can find a compressed directory containing the input (and expected output) discussed in this tutorial here. WARNING: the hbond_matrix.sh script in this archive is OUT OF DATE! Please always use the latest version in the svn repository! This is input for only 5 basin-hopping steps which is far too few. As it takes about 48 hours to do 100 steps for this system, 5 is a more realistic number to test!

What you need

In order to use HBONDMATRIX you must:

be using AMBER (AMBGMIN)
have the hbond_matrix.sh script in your working directory (you can find a copy in svn/SCRIPTS/AMBER/hbond_matrix)
have ptraj in your $PATH (try typing 'ptraj' and see if it runs!)
have a file containing a list of the residues you are interested in in your working directory (hbond_residues.dat in this example)

For influenza NA, this is a simple matter as it is well known which residues in the binding pocket are catalytic, and which support the structure of the pocket, the framework residues:

Error creating thumbnail: Unable to save thumbnail to destination

Top view of NA with oseltamivir (OTV) bound. Catalytic = red, framework = blue, within 8A of OTV = yellow

Error creating thumbnail: Unable to save thumbnail to destination

Side view of NA with oseltamivir (OTV) bound. Catalytic = red, framework = blue, within 8A of OTV = yellow

Standard number	Type	2HU4 number	2HU4 type
118	Catalytic	36	ARG
119	Framework	37	GLU
151	Catalytic	69	ASP
152	Catalytic	70	ARG
156	Framework	74	ARG
178	Framework	97	TRP
179	Framework	98	SER
198	Framework	117	ASP
222	Framework	141	ILE
224	Catalytic	143	ARG
227	Framework	146	GLU
274	Framework	193	HIS
276	Catalytic	195	GLU
277	Framework	196	GLU
292	Catalytic	211	ARG
294	Framework	213	ASN
371	Catalytic	286	ARG
406	Catalytic	320	TYR
425	Framework	343	GLU

It therefore makes sense to use these residues + the ligand itself (residue 386) as our residues of interest when looking at binding modes.

hbond_residues.dat

If your system does not have such an obvious set of residues to use, I suggest you choose them radially from the binding pocket, and be quite generous. You want around two shells of residues on all sides of the pocket.

have a file containing the ptraj format hydrogen-bond donor/acceptor atom masks for your system (donors_acceptors.dat in this example)

This file should be in the AMBER 'mask' syntax used by the program ptraj as described in the AMBER Tools manual here:

donors_acceptors.dat

#-- DONORS from standard amino acid sidechains
donor mask :GLN@OE1
donor mask :GLN@NE2
donor mask :ASN@OD1
donor mask :ASN@ND2
donor mask :TYR@OH
donor mask :ASP@OD1
donor mask :ASP@OD2
donor mask :GLU@OE1
donor mask :GLU@OE2
donor mask :SER@OG
donor mask :THR@OG1
donor mask :HIS@ND1
donor mask :HIE@ND1
donor mask :HID@NE2

#-- ACCEPTORS from standard amino acid sidechains
acceptor mask  :ASN@ND2 :ASN@HD21
acceptor mask  :ASN@ND2 :ASN@HD22
acceptor mask  :TYR@OH  :TYR@HH
acceptor mask  :GLN@NE2 :GLN@HE21
acceptor mask  :GLN@NE2 :GLN@HE22
acceptor mask  :TRP@NE1 :TRP@HE1
acceptor mask  :LYS@NZ  :LYS@HZ1
acceptor mask  :LYS@NZ  :LYS@HZ2
acceptor mask  :LYS@NZ  :LYS@HZ3
acceptor mask  :SER@OG  :SER@HG
acceptor mask  :THR@OG1 :THR@HG1
acceptor mask  :ARG@NH2 :ARG@HH21
acceptor mask  :ARG@NH2 :ARG@HH22
acceptor mask  :ARG@NH1 :ARG@HH11
acceptor mask  :ARG@NH1 :ARG@HH12
acceptor mask  :ARG@NE  :ARG@HE
acceptor mask  :HIS@NE2 :HIS@HE2
acceptor mask  :HIE@NE2 :HIE@HE2
acceptor mask  :HID@ND1 :HID@HD1
acceptor mask  :HIP@ND1,NE2 :HIP@HE2,HD1

#-- Backbone donors and acceptors
#   N-H for prolines do not exist so need to exclude :PRO@N from the acceptor mask
donor mask '!:OTV & @O'
acceptor mask  '!(:PRO | :OTV) & @N' '!:OTV & @H'

#Terminal residues have different atom names - OXT is only present in termini so can use always
donor mask @OXT
#This assumes there is only one protein segment i.e. if there was a terminal BEFORE the ligand
#you would need to add more residue numbers e.g. 'acceptor mask :1,20@N :1,20@H1' etc
acceptor mask :1@N :1@H1
acceptor mask :1@N :1@H2
acceptor mask :1@N :1@H3


### LIGAND (OTV) ###
#-- DONORS for OTV
donor mask :OTV@O1A
donor mask :OTV@O1B
donor mask :OTV@N4
donor mask :OTV@O7
donor mask :OTV@O10

#-- ACCEPTORS for OTV
acceptor mask :OTV@N4 :OTV@HAA
acceptor mask :OTV@N4 :OTV@HAB
acceptor mask :OTV@N5 :OTV@HAD

These files can be named whatever you like as they are specified as the second and first arguments of the HBONDMATRIX keyword respectively:

HBONDMATRIX donors_acceptors.dat hbond_residues.dat

The idea (how it works)

HBONDMATRIX introduces an order parameter to help GMIN distinguish ligand binding-modes. Each time a minimum is found, a hydrogen-bond matrix is constructed. This contains the number of hydrogen bonds detected between all pairs of residues specified as being interesting in hbond_residues.dat, and it can be thought of as providing a fingerprint for the binding mode. Here is an example of such a matrix for the NA system:

  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  1
  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  1
  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  1  0  0  0  2  0  0  0  0  0  0  0
  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  2
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2
  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0
  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  1  1  0  1  0  0  0  0  0  0  0  0  0  0  2  0  2  0  0  0

This is generated using the AMBER program ptraj, with input specified on the HBONDMATRIX keyword line in the GMIN data file:

HBONDMATRIX donors_acceptors.dat hbond_residues.dat

When you first start AMBGMIN, it minimises your input structure to generate an initial quench (Qu 0). A hydrogen-bond matrix is generated, and used to create group 1 - the first binding-mode group. From GMIN_out:

Qu          0 E=   -13293.6513505884 steps=    6 RMS= 0.43492E-02 Markov E=    -13293.65135     t=       25.2
 mc> Storing cis/trans information for initial structure
Starting MC run of        100 steps
Temperature will be multiplied by      1.00000000 at every step
 HBONDMATRIX> Group 1 created using initial quench structure

A step is then taken (in this case, GROUPROTATION of the ligand), and the result minimised as normal.
A hydrogen-bond matrix is generated for the new structure, and is compared to all existing groups.

If it DOES match an existing group, the that quench is flagged as belonging the the matching group, and the structure is accepted using the standard metropolis test - but with the Markov energy set to that of the last accepted structure in the matching group. From GMIN_out:

Qu          1 E=   -13293.6467051417 steps=  444 RMS= 0.48702E-02 Markov E=    -13293.65135     t=     1176.2
 HBONDMATRIX> Structure matches group    1
 HBONDMATRIX> Group    1 now has population     2
 HBONDMATRIX> Setting Markov energy to previous group   1 value of     -13293.65135
 HBONDMATRIX> New highest energy structure for group   1 found. E=    -13293.64671

If it DOES NOT match an existing group, it is used to form a new group and the step is automatically accepted into the Markov chain, regardless of the total energy. This is intended to promote exploration. From GMIN_out:

Qu          3 E=   -13283.3809296421 steps=  794 RMS= 0.45825E-02 Markov E=    -13293.64671     t=     1097.2
 HBONDMATRIX> New group found, creating group    2
 HBONDMATRIX> New group so step auto-accepted

Output files

Two information files are dumped by AMBGMIN when using HBONDMATRIX'. hbondgroupinfo contains information on each group found, while quenchtogroupref provides a list of which group each quenched structure was assigned to, and its energy.

hbondgroupinfo:

ID  Size      E(min)              E(max)            E(average)
   1   3    -13293.65135        -13293.64671        -13293.64825
   2   2    -13283.38290        -13283.38093        -13283.38191
   3   1    -13283.97944        -13283.97944        -13283.97944

quenchtogroupref:

Quench  Group ID  Energy
     0     1    -13293.65135
     1     1    -13293.64671
     2     1    -13293.64671
     3     2    -13283.38093
     4     2    -13283.38290
     5     3    -13283.97944

AMBGMIN will also dump the lowest energy structure found for each group in .pdb and AMBER .rst format named min.groupX.pdb and min.groupX.rst. Note that these are only converged to SLOPPYCONV and have not gone through the final quenching process.

If you also have the DUMPQU keyword in your data file (as we do here), you will also see a set of allgroupX.pdb files. These contain ALL the structures in each group, cat'd together so that they can be easily opened in vmd or Pymol.

Optimising individual binding-modes

The HBONDMATRIX keyword is geared towards exploring as large a variety of binding modes as possible, but this does not mean that you will necessarily hit on the optimal minimum for each mode. It is likely that changes in the nearby protein sidechains would lower the energy for example.

You can use the HBONDMATRIX keyword to focus on a single binding mode by adding REJECT to the end of the line in the GMIN data file. This changes the way HBONDMATRIX operates, by automatically rejecting all steps which end up in a new group from that defined by the initial quenched structure.

I should also mention that you'd probably want to use a different set of moves to optimise a single binding mode, as if you used these GROUPROTATIONS for the ligand, almost every step would end up in a new mode, and so be rejected.

I have not yet tested this, but it should work just fine :)

WARNING: you will still get the group files being produced - including the min.groupX.* files - but these should be ignored. All you care about if optimising a single binding mode is the lowest.*.1.pdb and min*.1.rst files produced.

--Csw34 12:33, 1 May 2012 (UTC)

Ligand binding-mode searches with HBONDMATRIX

Contents

Introduction

What you need

The idea (how it works)

Output files

Optimising individual binding-modes

Navigation menu

Ligand binding-mode searches with HBONDMATRIX

Introduction

What you need

The idea (how it works)

Output files

Optimising individual binding-modes

Navigation menu

Search