CamCASP/Notes/7
CamCASP => Notes => Using the batch scripts
The ${CAMCASP}/bin/ folder contains a few batch scripts for running sets of jobs at various dimer geometries.
Obtaining the geometry file
The batch scripts all use a geometry file that contains the translation and rotation parameters needed to construct the dimer. This file is constructed by CamCASP as follows.
- Create a standard CamCASP command file for the dimer using CLUSTER.
- Edit it to look like:
TITLE benzene1 and benzene2 : GRID TITLE Basis sadlej and type MC+ TITLE Midbonds 3S2P1D and type weighted MEMORY 2048 MB SET Global_data CamCASP-path /home/am592/SITUS Units Bohr cm-1 Scf-code Dalton XC-func PBE0 Overwrite yes END MOLECULE benzene1 at 0.0 0.0 0.0 Charge 0 Echo No Hessian format SAPT2006 Skip MOS <----------*** Basis Main Spherical Units Bohr Format GAMESS C1 6.0 2.64623243 0.00000000 0.00000000 TYPE C1 #include-camcasp basis/gamess_us/sadlej/C --- ... ... END MOLECULE benzene2 at 0.0 0.0 0.0 Charge 0 Echo No Hessian format SAPT2006 Skip MOS <----------*** Basis Main ... ... END Begin Energy-scan Probe benzene1 with benzene2 Scan Nothing Units Bohr Random Points 40 dRmin -1.5 dRmax 1.2 Radial 2 End Energy-file PREFIX random End FINISH
The new bits are the ENERGY-SCAN commands and those indicated by '***'. The Skip MOs command is needed to get CamCASP to go on without reading in the molecular orbitals. The ENERGY-SCAN module probes the first molecule (benzene1) with the second (benzene2). The order is important. But all we need are the geometries, so we Scan Nothing. The results of the energy scan will be in file random.dat. This file looks like:
TITLE Energy scan of MOL-A by MOL-B TITLE benzene1 and benzene2 : GRID TITLE Basis sadlej and type MC+ TITLE Midbonds 3S2P1D and type weighted MOL-A benzene1 MOL-B benzene2 POINTS 40 ENERGY-UNITS CM-1 LENGTH-UNITS BOHR ANGLE-UNITS DEGREES LABELS INDEX Rx Ry Rz alpha Nx Ny Nz E(1)elst E(2)ind ... 1 0.145892E+02 0.000000E+00 0.000000E+00 0.180000E+03 0.100000E+01 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 ... 2 0.134629E+02 0.000000E+00 0.000000E+00 0.180000E+03 0.100000E+01 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 ... ... ... 40 -0.665925E+01 -0.444957E+01 0.827950E+01 0.141980E+03 0.515613E+00 0.712421E+00 -0.476025E+00 0.000000E+00 0.000000E+00 ... END END-FILE
Yes, it has many columns and all we need are the first few that contain the geometry.
- Now we need to make small changed to random.dat in order to get it in a form suitable for the batch scripts. Do the following. Copy it to random.geom (could be any name really, but let's use this). Now edit random.geom so that it looks like:
! TITLE Energy scan of MOL-A by MOL-B ! TITLE benzene1 and benzene2 : GRID ! TITLE Basis sadlej and type MC+ ! TITLE Midbonds 3S2P1D and type weighted ! MOL-A benzene1 ! MOL-B benzene2 ! POINTS 40 ! ENERGY-UNITS CM-1 ! LENGTH-UNITS BOHR ! ANGLE-UNITS DEGREES ! LABELS INDEX Rx Ry Rz alpha Nx Ny Nz 1 0.145892E+02 0.000000E+00 0.000000E+00 0.180000E+03 0.100000E+01 0.000000E+00 0.000000E+00 2 0.134629E+02 0.000000E+00 0.000000E+00 0.180000E+03 0.100000E+01 0.000000E+00 0.000000E+00 ... ... 39 -0.587535E+01 -0.392578E+01 0.730487E+01 0.141980E+03 0.515613E+00 0.712421E+00 -0.476025E+00 40 -0.665925E+01 -0.444957E+01 0.827950E+01 0.141980E+03 0.515613E+00 0.712421E+00 -0.476025E+00 ! END ! END-FILE
We have commented out all text lines that tell us which system this file is for. And we have removed the energy columns that come after Nz.
This file is now ready to be read by the batch scripts.
NOTES:
- It is often convenient to split this file into multiple parts so as to run the batch jobs in 'parallel' on multiple nodes. If you do so, the INDEX column comes in handy as it is used by the batch scripts to decide on the job number.
SAPT(DFT) jobs with batch_SAPT.pl
You cannot always perform an energy scan using the ENERGY-SCAN module in CamCASP as this module requires you to use the MC type of basis. If you need to use the MC+, DC or DC+ types of bases, then you will need to use the batch_SAPT.pl script. What this script does is to use a template CLUSTER command file and a file containing the translation and rotation parameters that define the dimer in order to run a batch of SAPT(DFT) (and also SAPT) calculations in an automatic way.
Here are the options:
$ batch_SAPT.pl Usage: batch_SAPT.pl [-j] job [-geom name] [-clt name] [ -b dc | mc | mc+ | dc+ ] -ipa IP(A) -ipb IP(B) [-q queue] [-M memory in MB]
You need to supply:
- The job name. For example, this might be benzene2.
- The name of the geometry file. The default name will be jobname.geom, so if this file is called benzene2.geom, then there is no need to specify it explicitly.
- The name of the CLUSTER command file. See below. As with the geometry file, there is a default value of jobname.clt.
- The type of basis to be used. (Is this now obsolete?)
- The ionisation potentials of the two monomers.
- The name of the queue. As always, there are special 'queues': bg says run the job in the background. The other special 'queue' none may not be usable.
- You can also specify the memory to be used. This memory (in MB) will be passed to DALTON. The memory given to CamCASP should be in the CLUSTER file.
The template CLUSTER file
What the batch_SAPT.pl script does is use a template CLUSTER file to create the various DALTON and CamCASP input files at each dimer geometry. Here's what such a file looks like:
Title Benzene dimer : Sadlej MC+ ! Set Global-data CamCASP /home/ajm/CamCASP Units Bohr Degree End Molecule benzene1 Units Bohr C1 6.0 2.6462324347 0.0000000000 0.0000000000 C2 6.0 1.3231162173 2.2917045128 0.0000000000 C3 6.0 -1.3231162173 2.2917045128 0.0000000000 C4 6.0 -2.6462324347 0.0000000000 0.0000000000 C5 6.0 -1.3231162173 -2.2917045128 0.0000000000 C6 6.0 1.3231162173 -2.2917045128 0.0000000000 H1 1.0 4.7082995437 0.0000000000 0.0000000000 H2 1.0 2.3541497718 4.0775070135 0.0000000000 H3 1.0 -2.3541497718 4.0775070135 0.0000000000 H4 1.0 -4.7082995437 0.0000000000 0.0000000000 H5 1.0 -2.3541497718 -4.0775070135 0.0000000000 H6 1.0 2.3541497718 -4.0775070135 0.0000000000 End Molecule benzene2 Units Bohr C1 6.0 2.6462324347 0.0000000000 0.0000000000 C2 6.0 1.3231162173 2.2917045128 0.0000000000 C3 6.0 -1.3231162173 2.2917045128 0.0000000000 C4 6.0 -2.6462324347 0.0000000000 0.0000000000 C5 6.0 -1.3231162173 -2.2917045128 0.0000000000 C6 6.0 1.3231162173 -2.2917045128 0.0000000000 H1 1.0 4.7082995437 0.0000000000 0.0000000000 H2 1.0 2.3541497718 4.0775070135 0.0000000000 H3 1.0 -2.3541497718 4.0775070135 0.0000000000 H4 1.0 -4.7082995437 0.0000000000 0.0000000000 H5 1.0 -2.3541497718 -4.0775070135 0.0000000000 H6 1.0 2.3541497718 -4.0775070135 0.0000000000 End Rotate benzene2 by alpha about Nx Ny Nz Place benzene2 at Rx Ry Rz Files Molecules benzene1 and benzene2 File-prefix JOB Basis Sadlej Type MC+ Midbond 3s2p1d Type Weighted Aux-basis aTZ Interface files Memory 2 GB End Finish
Most of the file is as described in the Users' Guide. But there are important changes. We assume that the second molecule, benzene2 in the example, is to be rotated and translated. So we have included commands to do this:
Rotate benzene2 by alpha about Nx Ny Nz Place benzene2 at Rx Ry Rz
But the actual rotations and translations are not specified. Instead we have used tags alpha,Nx,Ny,Nz for the rotation angle (degrees) and axis, and tags Rx Ry Rz for the point (in Bohr) at which the benzene2 molecule will be placed. The units used can be changed but they must be consistent with those used to create the geometry file. The batch_SAPT.pl script will replace these tags by actual values.
The other change is the job name: This has been specified as
File-prefix JOB
and will be replaced by the actual job name by the batch_SAPT.pl script. The actual job name will be the job name given to this script and an additional '_###' numerical index taken from the geometry file. This helps to keep the runs distinct from each other.
So you can now run the job using
batch_SAPT.pl benzene2 -clt benzene2.clt -geom random.geom -b mc+ -ipa 0.3397 -ipb 0.3397 -q intel.q -M 3000
where my geometries are in file random.geom.
As jobs finish, they are copied to your main directory (from where the command was issued) to sub-folders benzene2_001, benzene2_002, etc. The actual indices used will depend on the geometry indices in the geometry file. This is handy as, should you split the geometry file into pieces to run on multiple nodes of your machine, the file names used will not clash.
You will need to search through all these directories for the results. This can be done using the batch_search.pl script.
Searching using batch_search.pl
Once you have the result of batch_SAPT.pl ready you will want to search for energies. This is most conveniently done using the batch_search.pl script that is a wrapper for the search.pl script. It accepts the flags:
$ batch_search.pl Usage: batch_search.pl [-j] job [-geom name] [-keys file] [-d digits] [-camcasp] If the keys file is not present, it is created with the default SAPT(DFT) key. To get the energies in the order used by the ENERGY-SCAN module in CamCASP use key 'saptdft-camcasp' and six digits with '-d 6'.
The -camcasp option writes out the results in almost the same format used by module ENERGY-SCAN, so it can be then read back into CamCASP using the ENERGY-SCAN module.
So, to search for SAPT(DFT) interaction energies calculated in benzene dimer example above we would use a keys file containing the line:
$ more keys saptdft-camcasp $ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3 File benzene2_27/OUT/benzene2_27.out not found File benzene2_28/OUT/benzene2_28.out not found INDEX Rx Ry Rz alpha Nx Ny Nz elst ind2C disp2C exch exind2C exdisp2C 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 14.589 0.000 0.000 180.000 1.000 0.000 0.000 0.852 ----- ----- 0.536 ----- ----- 2 13.463 0.000 0.000 180.000 1.000 0.000 0.000 0.953 ----- ----- 4.269 ----- ----- 3 -9.173 0.000 5.296 180.000 -0.707 0.000 -0.707 -1.606 ----- ----- 8.368 ----- ----- ... ... 40 -6.659 -4.450 8.280 141.980 0.516 0.712 -0.476 ----- ----- ----- ----- ----- -----
NOTES:
- If a jobs failed the script will say that it cannot file a file.
- Energies not calculated in the scan are written as '-----'
Now if you had used the -camcasp flag you would get the following:
$ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3 -camcasp File benzene2_27/OUT/benzene2_27.out not found File benzene2_28/OUT/benzene2_28.out not found TITLE MOL-A MOL-B POINTS 40 Energy-units kJ/mol LENGTH-UNITS ANGLE-UNITS LABELS INDEX Rx Ry Rz alpha Nx Ny Nz E(1)elst E(2)ind ... 1 1.459E+01 0.000E+00 0.000E+00 1.800E+02 1.000E+00 0.000E+00 0.000E+00 8.521E-01 0.0E+00 ... 2 1.346E+01 0.000E+00 0.000E+00 1.800E+02 1.000E+00 0.000E+00 0.000E+00 9.528E-01 0.0E+00 ... ... ... 40 -6.659E+00 -4.450E+00 8.280E+00 1.420E+02 5.156E-01 7.124E-01 -4.760E-01 0.0E+00 0.0E+00 ... END END-FILE
which is just the format used by the ENERGY-SCAN module, so this output can be read using that module. But you will need to fill up the molecule names and units used.
ORIENT jobs using batch_ORIENT.pl
$ batch_ORIENT.pl Usage: batch_ORIENT.pl [-j] job [-geom name] [-ornt name] [-pot name] [ -iter iterations ] [ -rank rank ] [ -damp beta ] [-d digits]