Rigid body input files for proteins using genrigid-input.py
Overview
The use of rigid bodies requires two files: the rbodyconfig file (definition of rigid bodies) and the coordsinirigid file. The genrigid-input.py creates those files based on a pdb input and returns additionally the file rigid.log. The script also needs a coords.inpcrd file to achieve a better precision for the coordinates. Additionally, group rotations input can be generated. The script has been updated to work for RNA, DNA and proteins as well as mixed systems for AMBER14 atom types and residue names.
There are four parts to the script:
- 1. The residues are sorted into all atomistic (fully flexible), locally rigid and fully rigid regions. The definition of rigid bodies is then done according to this assignment.
- 2. The fully rigidified parts of the molecules are put into rigid bodies.
- 3. An additional local rigidification scheme is defined.
- 4. Creation of the atomgroups file for GROUPROTATION moves in GMIN.
If no residue is set to be fully rigid or locally rigid these parts of the code are skipped. Similarly, only the selected RNA, DNA or protein parts of the script are considered.
Following is a short example using a system with 11 residues and one of them is a custom made residue.
Starting the script
The script is started in the following way:
python genrigid_input.py my_protein.pdb coords.inpcrd
This will load the file my_protein.pdb and coords.inpcrd using the default tolerance 0.01. As the tolerance is only needed for user defined rigid bodies and not for the standard selection this will be the best option for most (see the section on user defined rigid bodies for more information). Alternatively, a changed tolerance can be entered as follows and with the keyword 'pymol' a visualisation is loaded (pymol needs to be installed and then the tolerance input needs to be used as well.
python genrigid_input.py my_protein.pdb coords.inpcrd 0.005 pymol
You can also just simply run the script and there will be a prompt to enter a pdb and a coordinate file.
PDB file: my_protein.pdb Coords Input: coords.inpcrd
Now you will need to decide what parts of the system will be considered. This only applies to the locally rigid region and the grouprotations generated.
Use script for peptides (1), RNA (2), DNA (3), or mixed (4)?
Once python started you should see the following text:
Molecule was loaded Number of residues: 11 Number of atoms: 201
Check at this stage that the number of residues and atoms is correct. If there is a problem, check that your pdb file is a standard pdb file and that pdb and coords.inpcrd are indeed for one molecule. (N.B. it does not matter if the coordinates in your pdb and the coords.inpcrd match as the coordinates in the pdb file are ignored.)
Unknown residues and ligands
If your system contains ligands, DNA, RNA or residues with unknown identifiers, the following will show up next:
There are ligands or unknown residues. 6 - xyx 7 - xxx Shall some or all of the above be treated as normal residues, all other residues will be treated as fully atomistic? (y/n)
In this case, we have residue number 6 of type xyx and number 7 of type xxx. Our choice is now to either ignore this fact and treat them as normal residues or to set them as fully atomistic straight away. This choice can be changed later on if needed, but it is best to make choices about the treatment of the ligands now and stick with them to keep track. Entering "n" adds all of the detected residues to the fully atomistic residues. If "y" is selected, the following dialogue appears.
Shall some or all of the above be treated as normal residues, all other residues will be treated as fully atomistic? (y/n) Enter the residue numbers separated by spaces (enter a for all residues): 7 The following residues will be treated atomistic: 6 ------------- Shall some or all of the above be treated as normal residues, all other residues will be treated as fully atomistic? (y/n) Enter the residue numbers separated by spaces (enter a for all residues): a The following residues will be treated atomistic:
In the first case, we select residue 7 to be treated normally. Residue 6 will hence be all atomistic. In the second case, a is selected for all and hence both residues will be treated normally.
Selection of fully flexible, locally rigid and fully rigid regions
Now it is time to select residues for fully atomistic and locally rigid regions.
Selection of atomistic and locally rigidified residues 1 - choose sphere/ellipsoid of unfrozen atoms and add localised rigidification 2 - choose a range of resdiues or single residues to unfreeze or locally rigidify Method:
Spherical and ellipsoidal volumes
Selecting method 1 allows the definition of a spherical or ellipsoidal volume that will be fully flexible with a surrounding shell of locally rigidified residues. This for example is useful for binding pockets. Firstly, we have to choose the centre of the volume.
Using the mass-weighted centre of a residue as centre
Mass weighted centre of residue (1) or atom in a selected residue (2)? 1 Residue: 6 Mass weighted centre: (19.32302371, 28.69824029, 37.07328727)
This simply calculates the mass-weighted centre of a residue. You can choose any residue in the molecule, however only H, D, C, N ,O and S are defined. So if your system contains other atoms, you will need to change the list of masses in the script to account for this.
Using a selected atom
This simply centres the selected volume at the coordinates of one atom.
Mass weighted centre of residue (1) or atom in a selected residue (2)? 2 Display atoms in residue? (y/n)
If you know what atom you want to choose, just enter n; otherwise use y to display all atoms within a chosen residue:
Display atoms in residue? (y/n) y Residue number: (enter x to leave) 6 Residue: 6 xyx Atoms: 99 N 18.453 27.971 38.566 : : : 113 O 22.105 28.141 38.602 Residue number: (enter x to leave) x
Then simply enter the atom number:
Choose atom: 106
Radius for the atomistic region
Next we have to decide on the size of the sphere or ellipsoid. The value entered is in Angstrom and a residue is considered to belong to the volume if one of its atoms is inside the defined volume.
The two choices are entering one radius for a sphere, or three for an ellipsoid:
Radius of atomistic region (enter three values for an ellipsoid and one for a sphere): 4.5 3 3.5
Local rigidification
Next we define the locally rigid shell. Only residues can be selected that have not been previously selected to be fully atomistic.
Add localised rigidification (y/n)? y Radius of locally rigid region (enter three values for an ellipsoid and one for a sphere): 6 7.7 5 The following residues are not frozen (atomistic): [4, 6, 7] The following residues are selected to be locally rigidified: [5] The following residues are selected to be fully rigidified: [1, 2, 3, 8, 9, 10, 11]
Selection of a range of residues
Choosing method 2 allows the input of ranges or a selection of residues.
Method: 2 Define atomistic residues (y/n)? y Enter residues separated by spaces, use r followed by two residues to define a range: r 5 7 11 Locally rigidify (y/n)? y Enter residues separated by spaces, use r followed by two residues to define a range: r 3 11 The following residues are not frozen (atomistic): [5, 6, 7, 11] The following residues are selected to be locally rigidified: [3, 4, 8, 9, 10] The following residues are selected to be fully rigidified: [1, 2]
In this example, we decide that residues 5 to 7 and residue 11 should be fully atomistic and residues 3, 4 and 8 to 10 should be locally rigidified. The syntax for ranges is simply r <start> <finish>. Residues that are already assigned will not be reassigned in this step. However, it is possible to remove residues from a previous made selection.
At the end of either method follows the prompt:
Enter more residues (1) or remove residues from previous selection (2)? Any other key to exit.
Entering 1 brings you back to method selection and you can enter more residues following the same procedure as before (it does not matter which method you choose as each cycle is independent of the previous one and the only information stored is the selection made). If you are happy with the sorting of the residues into categories, just hit enter. If you made a mistake, don't worry, just select two.
Removing parts of a selection
From the previous selection we would like to remove residue 11 from the fully atomistic selection and residues 9 and 10 from the locally rigidified residues:
Enter more residues (1) or remove residues from previous selection (2)? Any other key to exit. 2 Remove from the atomistic residues (y/n)? y Enter residues separated by spaces, use r followed by two residues to define a range: 11 Locally rigidify (y/n)? y Enter residues by spaces, use r followed by two residues to define a range: r 3 11 The following residues are not frozen (atomistic): [5, 6, 7] The following residues are selected to be locally rigidified: [3, 4, 8] The following residues are selected to be fully rigidified: [1, 2, 9, 10, 11]
We can now reassign these residues, if we wish so. For this example we leave them as it stands. Our final selection is therefore three fully atomistic residues: 5 to 7; and three locally rigidified residues: 3, 4 and 8.
Definition of the fully rigid bodies
Moving on, we now have to sort the fully rigid residues into rigid bodies. We can either choose to merge them all into one rigid body (option 1) or to define groups that are somewhat more sensible, for example avoiding that we glue chains together and trap one configuration.
Grouping of the rigid body 1-Group all parts of the rigid system as one body 2-Define groups Grouping method: 2 The following residues are frozen: [1, 2, 9, 10, 11] How many groups will be defined? 2 Residues for rigid body: 1 2 Residues for rigid body r 9 11 All residues are assigned. The following rigid bodies have been specified: Rigid body 1 [1, 2] Rigid body 2 [9, 10, 11]
The 5 residues are sorted into two rigid bodies, one at each end of the protein. If you use this method, all residues need to be assigned and the number of rigid bodies given, needs to match the number of rigid bodies entered, otherwise you will be prompted to re-enter your rigid bodies until these criteria are matched.
Definition of the locally rigid bodies
After the fully rigid region, we are now defining rigid bodies for the local rigidification.
Firstly, there is a number of pre-defined options that you can choose from. These are rigid bodies that conform with how the code works and should not cause any problems. However, if you rather want to define your own rigidification scheme, you might want to ignore the suggested rigid bodies as you otherwise might have clashes between the rigid bodies.
Grouping of the local rigid bodies Group proline as rigid body C-O-N-CD-CG-CB-CA(1) or as N-CD-CG-CB-CA(2)? 2 Group arginine as rigid body NE-HE-CZ-NH1-HH11-HH12-NH2-HH21-HH22 (y/n)? y Group histidine (HIS, HIE, HID, HIP) as rigid body CG-ND1-CE1-NE2-CD2 (y/n)? y Group lysine as rigid body NZ-HZ1-HZ2-HZ3 (y/n)? y Group aspartic acid as rigid body CB-CG-OD1-OD2 (y/n)? y Group asparagine as rigid body CB-CG-OD1-ND2-HD21-HD22 (y/n)? y Group glutamic acid as rigid body CG-CD-OE1-OE2 (y/n)? y Group glutamine as rigid body CG-CD-OE1-NE2-HE21-HE22 (y/n)? y Group phenylalanine as rigid body CG-CD1-HD1-CE1-HE1-CZ-HZ-CE2-HE2-CD2-HD2 (y/n)? y Group tyrosine as rigid body CG-CD1-HD1-CE1-HE1-CZ-OH-CE2-HE2-CD2-HD2 (y/n)? y Group tryptophan as either: 1 rigid body:(CG-CD1-HD1-NE1-HE1-CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2) or 2 rigid bodies:(CG-CD1-HD1-NE1-HE1)-(CE2-CZ2-HZ2-CH2-HH2-CZ3-HZ3-CE3-HE3-CD2)? 2
This selection does not depend on whether you actually have these residues present, as it will only be applied to matching residues. N.B. there are two questions that need to be answered with 1, 2 or n instead of y or n!
Next up are the peptide bonds. To ensure that there are no conflicts between rigid bodies the following rules are used:
- PRO is excluded from the peptide bonding list and if wished needs to be added separately
- if two neighbouring residues are termini they are not considered as bonded (this would be an issue for dipeptides!)
- residues that are next to a residue that belongs to the fully rigid region are NOT included, i.e. the peptide bond between them will not be a rigid body
- residues that are next to a residue that belongs to the fully atomistic region ARE included, i.e. the peptide bond between them will be a rigid body
Shall the peptide bonds be treated as rigid bodies (excl. PRO) (y/n)? y
If we have RNA (or DNA) the following should be expected:
Group phosphate as (O3'-POO-O5') (1) or POO (2)? 1 Rigidify sugar (y/n)? n Group guanine base as one rigid body (1) or leave contacts free (2)? 1 Group adenine base as one rigid body (1) or leave contacts free (2)? 1 Group cystosine base as one rigid body (1) or leave contacts free (2)? 1 Group uracil base as one rigid body (1) or leave contacts free (2)? 1
If this is all we wish to do, we enter n for:
Enter additional user-defined rigid bodies (y/n)?
This part of the program terminates with:
Output files will be written now.
User defined rigid bodies
Defining rigid bodies yourself you have to follow the following rules:
- only residues in the locally rigid selection can be used for this
- no atom can be used more than once
- no rigid body consist of less than 3 atoms
- no rigid body can be linear.
The test for linearity is based on the tolerance and an angle criterion: one of the inter-atomistic atoms (of all possible triples within the rigid body) needs to be within 0.0 + tolerance and 180.0 - tolerance.
Enter additional user-defined rigid bodies (y/n)? y 1 - select residues by name 2 - select one or more residues by id Method (1/2):
Selection by name
Method (1/2): 1 Residue name (three letter code, n to exit): LEU The following atoms are already in local rigid bodies: 125 N LEU 8 126 H LEU 8 142 C LEU 8 143 O LEU 8 The residue LEU contains the following atoms: N H CA HA CB HB2 HB3 CG HG CD1 HD11 HD12 HD13 CD2 HD21 HD22 HD23 C O Enter a sequence of atom names separated by spaces: CG HG CD1 CD2 HD11 HD12 HD13 HD21 HD22 HD23 Enter more rigid bodies (y/n)?
We decided to form a rigid body of the LEU side chain, however we only found one matching residue and some of the atoms are already in a rigid body from the peptide bonding. We have to avoid these atoms, otherwise the rigid body will not be accepted. If we have more than one residue matching the name, the rigid body will be formed for everyone of them, as long as there is no conflict (a conflict in one residue will not change the rigid bodies for all others).
Selection by ID
Selection by Id can be used in various ways, but especially to fit experimental constraints (e.g. intramolecular bonds of known fixed length).
Method (1/2): 2 Residue numbers (n to exit): 3 4 The following atoms are already in local rigid bodies: 61 N PRO 4 62 CD PRO 4 65 CG PRO 4 68 CB PRO 4 71 CA PRO 4 73 C PRO 4 74 O PRO 4 The residues [3, 4] contain the following atoms that are not assigned to rigid bodies: 42 - N, 43 - H, 44 - CA .... Enter a sequence of atom numbers separated by spaces: 59 60 72 Enter more rigid bodies (y/n)?
Entering y allows to to again select a method and then define more rigid bodies. Otherwise you terminate the script and get your output files.
Group rotation file
The next question after the rigid body script is finished is the following:
Create group rotation file (y/n)?
If you want to skip the rigidification and just create the group rotation file, you will need to simply make all residue all atomistic.
The first input is to select a uniform selection probability:
Uniform selection probability (suggested: 0.025): 0.5 This is a large probability (values recommended: 0.01 to 0.1). Continue with the selected value (y/n)? n Uniform selection probability (suggested: 0.025): 0.0004 This is a small probability (values recommended: 0.01 to 0.1). Continue with the selected value (y/n)? n Uniform selection probability (suggested: 0.025): 0.02
The script accepts strange inputs in this case, but asks you for a confimation if your selected probability is outside a certain range. The next step is to decide what part of the molecule should be used for the group rotations. Clearly, the fully rigidified region cannot be chosen in this case.
Create the group rotation file for the all atom region (1), the locally rigidified region (2), or both regions (3)? 3
The following will then be questions for the input of the groups that should be rotated and once again the possibility to create user defied groups.
Running parmed.py to exclude interactions within the rigid bodies
Make sure you have the following files in your working directory: coords.prmtop and parmed_in. Run the following code and the new file coords.prmtop.ni is a topology file that includes all internal rigid body interactions:
parmed.py coords.prmtop source parmed_in : [output about added exclusions] : go
To run parmed.py with AmberTools15 or newer, you will need to have the following in your .bashrc:
export AMBERHOME=/home/kr366/amber14 export PATH=$PATH:$AMBERHOME/bin # Add location of Amber Python modules to default Python search path if [ -z "$PYTHONPATH" ]; then export PYTHONPATH="${AMBERHOME}/lib/python2.7/site-packages" else export PYTHONPATH="${AMBERHOME}/lib/python2.7/site-packages:${PYTHONPATH}" fi if [ -z "${LD_LIBRARY_PATH}" ]; then export LD_LIBRARY_PATH="${AMBERHOME}/lib" else export LD_LIBRARY_PATH="${AMBERHOME}/lib:${LD_LIBRARY_PATH}" fi
Important information for changes to the script
If you want to change how the script works, add your own functionality or extend the script to for example RNA and DNA there are a few things to be aware of:
The script stores all information in two dictionaries: res_dic and atom_dic. The first one has all the residues as keys and its values are the atoms belonging to this residue. This is used in most parts of the script and should NOT be changed. The second dictionary is containing the information about each atom sorted by the atom id. If you want to pass on more information, please attached it to the end of the used information, as otherwise you will have to update the indices at many instances within the script.
The rigid bodies formed from the fully rigid residues are in groups and they are stored by residue ids. The locally rigid scheme is stored in groups_local and here the atom id is used.
The list atoms_used contains all the atoms already assigned to a rigid body. For every new rigid body this list needs to be updated. This prevents that an atom is used in more than one rigid body.
Additional functionality should be added within the class defined at the start, that utilises the two dictionaries and hence allows easy access to all available information and also has already a wide range of defined methods that are tested.