Difference between revisions of "CamCASP/Notes/7"

From CUC3
Jump to navigation Jump to search
import>Am592
import>Am592
Line 211: Line 211:
   
 
You will need to search through all these directories for the results. This can be done using the '''batch_search.pl''' script.
 
You will need to search through all these directories for the results. This can be done using the '''batch_search.pl''' script.
  +
  +
==Searching using batch_search.pl==
  +
Once you have the result of batch_SAPT.pl ready you will want to search for energies. This is most conveniently done using the batch_search.pl script that is a wrapper for the search.pl script. It accepts the flags:
  +
<pre>
  +
$ batch_search.pl
  +
  +
Usage:
  +
batch_search.pl [-j] job [-geom name] [-keys file] [-d digits]
  +
[-camcasp]
  +
  +
If the keys file is not present, it is created with the default SAPT(DFT) key.
  +
  +
To get the energies in the order used by the ENERGY-SCAN module in CamCASP use
  +
key 'saptdft-camcasp' and six digits with '-d 6'.
  +
</pre>
  +
The ''-camcasp'' option writes out the results in almost the same format used by module '''ENERGY-SCAN''', so it can be then read back into CamCASP using the '''ENERGY-SCAN''' module.
  +
  +
So, to search for SAPT(DFT) interaction energies calculated in benzene dimer example above we would use a keys file containing the line:
  +
<pre>
  +
$ more keys
  +
saptdft-camcasp
  +
$ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3
  +
File benzene2_27/OUT/benzene2_27.out not found
  +
File benzene2_28/OUT/benzene2_28.out not found
  +
INDEX Rx Ry Rz alpha Nx Ny Nz elst ind2C disp2C exch exind2C exdisp2C
  +
0 1 2 3 4 5 6 7 8 9 10 11 12 13
  +
1 14.589 0.000 0.000 180.000 1.000 0.000 0.000 0.852 ----- ----- 0.536 ----- -----
  +
2 13.463 0.000 0.000 180.000 1.000 0.000 0.000 0.953 ----- ----- 4.269 ----- -----
  +
3 -9.173 0.000 5.296 180.000 -0.707 0.000 -0.707 -1.606 ----- ----- 8.368 ----- -----
  +
...
  +
...
  +
40 -6.659 -4.450 8.280 141.980 0.516 0.712 -0.476 ----- ----- ----- ----- ----- -----
  +
</pre>
  +
NOTES:
  +
* If a jobs failed the script will say that it cannot file a file.
  +
* Energies not calculated in the scan are written as '-----'
  +
  +
Now if you had used the ''-camcasp'' flag you would get the following:
  +
<pre>
  +
$ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3 -camcasp
  +
File benzene2_27/OUT/benzene2_27.out not found
  +
File benzene2_28/OUT/benzene2_28.out not found
  +
TITLE
  +
MOL-A
  +
MOL-B
  +
POINTS 40
  +
Energy-units kJ/mol
  +
LENGTH-UNITS
  +
ANGLE-UNITS
  +
LABELS INDEX Rx Ry Rz alpha Nx Ny Nz E(1)elst E(2)ind ...
  +
1 1.459E+01 0.000E+00 0.000E+00 1.800E+02 1.000E+00 0.000E+00 0.000E+00 8.521E-01 0.0E+00 ...
  +
2 1.346E+01 0.000E+00 0.000E+00 1.800E+02 1.000E+00 0.000E+00 0.000E+00 9.528E-01 0.0E+00 ...
  +
...
  +
...
  +
40 -6.659E+00 -4.450E+00 8.280E+00 1.420E+02 5.156E-01 7.124E-01 -4.760E-01 0.0E+00 0.0E+00 ...
  +
END
  +
END-FILE
  +
</pre>
  +
which is just the format used by the '''ENERGY-SCAN''' module, so this output can be read using that module. But you will need to fill up the molecule names and units used.
  +
  +
==ORIENT jobs using batch_ORIENT.pl==

Revision as of 18:29, 17 March 2009

CamCASP => Notes => Using the batch scripts

The ${CAMCASP}/bin/ folder contains a few batch scripts for running sets of jobs at various dimer geometries.

Obtaining the geometry file

The batch scripts all use a geometry file that contains the translation and rotation parameters needed to construct the dimer. This file is constructed by CamCASP as follows.

  • Create a standard CamCASP command file for the dimer using CLUSTER.
  • Edit it to look like:
TITLE benzene1 and benzene2  : GRID
TITLE Basis sadlej and type MC+
TITLE Midbonds 3S2P1D and type weighted

MEMORY       2048 MB

SET Global_data
  CamCASP-path /home/am592/SITUS
  Units Bohr cm-1
  Scf-code Dalton
  XC-func PBE0
  Overwrite yes
END

MOLECULE benzene1 at 0.0 0.0 0.0
   Charge    0
   Echo No
   Hessian format SAPT2006
   Skip MOS                       <----------***
   Basis Main
      Spherical
      Units Bohr
      Format GAMESS
      C1    6.0        2.64623243       0.00000000       0.00000000  TYPE C1
        #include-camcasp basis/gamess_us/sadlej/C
      ---
      ...
      ...
END
MOLECULE benzene2 at 0.0 0.0 0.0
   Charge    0
   Echo No
   Hessian format SAPT2006
   Skip MOS                         <----------***
   Basis Main
   ...
   ...
END

Begin Energy-scan
  Probe benzene1 with benzene2
  Scan Nothing
  Units Bohr
  Random
    Points 40
    dRmin -1.5
    dRmax  1.2
    Radial 2
  End
  Energy-file PREFIX random
End

FINISH

The new bits are the ENERGY-SCAN commands and those indicated by '***'. The Skip MOs command is needed to get CamCASP to go on without reading in the molecular orbitals. The ENERGY-SCAN module probes the first molecule (benzene1) with the second (benzene2). The order is important. But all we need are the geometries, so we Scan Nothing. The results of the energy scan will be in file random.dat. This file looks like:

TITLE  Energy scan of MOL-A by MOL-B
TITLE  benzene1 and benzene2 : GRID
TITLE  Basis sadlej and type MC+
TITLE  Midbonds 3S2P1D and type weighted
MOL-A  benzene1
MOL-B  benzene2
POINTS      40
ENERGY-UNITS  CM-1
LENGTH-UNITS  BOHR
ANGLE-UNITS  DEGREES
LABELS    INDEX         Rx            Ry            Rz            alpha         Nx            Ny            Nz             E(1)elst       E(2)ind   ...
       1   0.145892E+02   0.000000E+00   0.000000E+00   0.180000E+03   0.100000E+01   0.000000E+00   0.000000E+00   0.000000E+00   0.000000E+00  ...
       2   0.134629E+02   0.000000E+00   0.000000E+00   0.180000E+03   0.100000E+01   0.000000E+00   0.000000E+00   0.000000E+00   0.000000E+00   ...
...
...
      40  -0.665925E+01  -0.444957E+01   0.827950E+01   0.141980E+03   0.515613E+00   0.712421E+00  -0.476025E+00   0.000000E+00   0.000000E+00 ...
END
END-FILE

Yes, it has many columns and all we need are the first few that contain the geometry.

  • Now we need to make small changed to random.dat in order to get it in a form suitable for the batch scripts. Do the following. Copy it to random.geom (could be any name really, but let's use this). Now edit random.geom so that it looks like:
! TITLE  Energy scan of MOL-A by MOL-B
! TITLE  benzene1 and benzene2 : GRID
! TITLE  Basis sadlej and type MC+
! TITLE  Midbonds 3S2P1D and type weighted
! MOL-A  benzene1
! MOL-B  benzene2
! POINTS      40
! ENERGY-UNITS  CM-1
! LENGTH-UNITS  BOHR
! ANGLE-UNITS  DEGREES
! LABELS    INDEX         Rx            Ry            Rz            alpha         Nx            Ny            Nz
       1   0.145892E+02   0.000000E+00   0.000000E+00   0.180000E+03   0.100000E+01   0.000000E+00   0.000000E+00
       2   0.134629E+02   0.000000E+00   0.000000E+00   0.180000E+03   0.100000E+01   0.000000E+00   0.000000E+00
...
...
      39  -0.587535E+01  -0.392578E+01   0.730487E+01   0.141980E+03   0.515613E+00   0.712421E+00  -0.476025E+00
      40  -0.665925E+01  -0.444957E+01   0.827950E+01   0.141980E+03   0.515613E+00   0.712421E+00  -0.476025E+00
! END
! END-FILE

We have commented out all text lines that tell us which system this file is for. And we have removed the energy columns that come after Nz.

This file is now ready to be read by the batch scripts.

NOTES:

  • It is often convenient to split this file into multiple parts so as to run the batch jobs in 'parallel' on multiple nodes. If you do so, the INDEX column comes in handy as it is used by the batch scripts to decide on the job number.

SAPT(DFT) jobs with batch_SAPT.pl

You cannot always perform an energy scan using the ENERGY-SCAN module in CamCASP as this module requires you to use the MC type of basis. If you need to use the MC+, DC or DC+ types of bases, then you will need to use the batch_SAPT.pl script. What this script does is to use a template CLUSTER command file and a file containing the translation and rotation parameters that define the dimer in order to run a batch of SAPT(DFT) (and also SAPT) calculations in an automatic way.

Here are the options:

$ batch_SAPT.pl 
Usage:
    batch_SAPT.pl [-j] job [-geom name] [-clt name] [ -b dc | mc | mc+ | dc+ ]
        -ipa IP(A) -ipb IP(B) [-q queue] [-M memory in MB]

You need to supply:

  • The job name. For example, this might be benzene2.
  • The name of the geometry file. The default name will be jobname.geom, so if this file is called benzene2.geom, then there is no need to specify it explicitly.
  • The name of the CLUSTER command file. See below. As with the geometry file, there is a default value of jobname.clt.
  • The type of basis to be used. (Is this now obsolete?)
  • The ionisation potentials of the two monomers.
  • The name of the queue. As always, there are special 'queues': bg says run the job in the background. The other special 'queue' none may not be usable.
  • You can also specify the memory to be used. This memory (in MB) will be passed to DALTON. The memory given to CamCASP should be in the CLUSTER file.

The template CLUSTER file

What the batch_SAPT.pl script does is use a template CLUSTER file to create the various DALTON and CamCASP input files at each dimer geometry. Here's what such a file looks like:

Title Benzene dimer : Sadlej MC+
!
Set Global-data
  CamCASP /home/ajm/CamCASP
  Units Bohr Degree
End

Molecule benzene1
  Units Bohr
  C1   6.0    2.6462324347    0.0000000000    0.0000000000
  C2   6.0    1.3231162173    2.2917045128    0.0000000000
  C3   6.0   -1.3231162173    2.2917045128    0.0000000000
  C4   6.0   -2.6462324347    0.0000000000    0.0000000000
  C5   6.0   -1.3231162173   -2.2917045128    0.0000000000
  C6   6.0    1.3231162173   -2.2917045128    0.0000000000
  H1   1.0    4.7082995437    0.0000000000    0.0000000000
  H2   1.0    2.3541497718    4.0775070135    0.0000000000
  H3   1.0   -2.3541497718    4.0775070135    0.0000000000
  H4   1.0   -4.7082995437    0.0000000000    0.0000000000
  H5   1.0   -2.3541497718   -4.0775070135    0.0000000000
  H6   1.0    2.3541497718   -4.0775070135    0.0000000000
End
Molecule benzene2
  Units Bohr
  C1   6.0    2.6462324347    0.0000000000    0.0000000000
  C2   6.0    1.3231162173    2.2917045128    0.0000000000
  C3   6.0   -1.3231162173    2.2917045128    0.0000000000
  C4   6.0   -2.6462324347    0.0000000000    0.0000000000
  C5   6.0   -1.3231162173   -2.2917045128    0.0000000000
  C6   6.0    1.3231162173   -2.2917045128    0.0000000000
  H1   1.0    4.7082995437    0.0000000000    0.0000000000
  H2   1.0    2.3541497718    4.0775070135    0.0000000000
  H3   1.0   -2.3541497718    4.0775070135    0.0000000000
  H4   1.0   -4.7082995437    0.0000000000    0.0000000000
  H5   1.0   -2.3541497718   -4.0775070135    0.0000000000
  H6   1.0    2.3541497718   -4.0775070135    0.0000000000
End

Rotate benzene2 by alpha about Nx Ny Nz
Place benzene2 at Rx Ry Rz

Files
  Molecules benzene1 and benzene2
  File-prefix JOB
  Basis Sadlej
  Type MC+
  Midbond 3s2p1d Type Weighted
  Aux-basis aTZ
  Interface files
  Memory 2 GB
End

Finish

Most of the file is as described in the Users' Guide. But there are important changes. We assume that the second molecule, benzene2 in the example, is to be rotated and translated. So we have included commands to do this:

Rotate benzene2 by alpha about Nx Ny Nz
Place benzene2 at Rx Ry Rz

But the actual rotations and translations are not specified. Instead we have used tags alpha,Nx,Ny,Nz for the rotation angle (degrees) and axis, and tags Rx Ry Rz for the point (in Bohr) at which the benzene2 molecule will be placed. The units used can be changed but they must be consistent with those used to create the geometry file. The batch_SAPT.pl script will replace these tags by actual values.

The other change is the job name: This has been specified as

File-prefix JOB

and will be replaced by the actual job name by the batch_SAPT.pl script. The actual job name will be the job name given to this script and an additional '_###' numerical index taken from the geometry file. This helps to keep the runs distinct from each other.

So you can now run the job using

batch_SAPT.pl benzene2 -clt benzene2.clt -geom random.geom -b mc+ -ipa 0.3397 -ipb 0.3397 -q intel.q -M 3000

where my geometries are in file random.geom.

As jobs finish, they are copied to your main directory (from where the command was issued) to sub-folders benzene2_001, benzene2_002, etc. The actual indices used will depend on the geometry indices in the geometry file. This is handy as, should you split the geometry file into pieces to run on multiple nodes of your machine, the file names used will not clash.

You will need to search through all these directories for the results. This can be done using the batch_search.pl script.

Searching using batch_search.pl

Once you have the result of batch_SAPT.pl ready you will want to search for energies. This is most conveniently done using the batch_search.pl script that is a wrapper for the search.pl script. It accepts the flags:

$ batch_search.pl 

Usage:
    batch_search.pl [-j] job [-geom name] [-keys file] [-d digits]
       [-camcasp]

If the keys file is not present, it is created with the default SAPT(DFT) key.

To get the energies in the order used by the ENERGY-SCAN module in CamCASP use
key 'saptdft-camcasp' and six digits with '-d 6'.

The -camcasp option writes out the results in almost the same format used by module ENERGY-SCAN, so it can be then read back into CamCASP using the ENERGY-SCAN module.

So, to search for SAPT(DFT) interaction energies calculated in benzene dimer example above we would use a keys file containing the line:

$ more keys
saptdft-camcasp
$ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3
File benzene2_27/OUT/benzene2_27.out not found
File benzene2_28/OUT/benzene2_28.out not found
    INDEX        Rx        Ry        Rz     alpha        Nx        Ny        Nz      elst     ind2C    disp2C      exch   exind2C  exdisp2C 
        0         1         2         3         4         5         6         7         8         9        10        11        12        13 
        1     14.589      0.000      0.000    180.000      1.000      0.000      0.000      0.852    -----      -----        0.536    -----      -----   
        2     13.463      0.000      0.000    180.000      1.000      0.000      0.000      0.953    -----      -----        4.269    -----      -----   
        3     -9.173      0.000      5.296    180.000     -0.707      0.000     -0.707     -1.606    -----      -----        8.368    -----      -----  
...
...
       40     -6.659     -4.450      8.280    141.980      0.516      0.712     -0.476    -----      -----      -----      -----      -----      -----   

NOTES:

  • If a jobs failed the script will say that it cannot file a file.
  • Energies not calculated in the scan are written as '-----'

Now if you had used the -camcasp flag you would get the following:

$ batch_search.pl benzene2 -geom random-grid_points.dat -keys keys -d 3 -camcasp
File benzene2_27/OUT/benzene2_27.out not found
File benzene2_28/OUT/benzene2_28.out not found
TITLE 
MOL-A 
MOL-B 
POINTS 40 
Energy-units kJ/mol
LENGTH-UNITS 
ANGLE-UNITS 
LABELS         INDEX            Rx            Ry            Rz         alpha            Nx            Ny            Nz      E(1)elst       E(2)ind ...
         1      1.459E+01     0.000E+00     0.000E+00     1.800E+02     1.000E+00     0.000E+00     0.000E+00     8.521E-01       0.0E+00    ...
         2      1.346E+01     0.000E+00     0.000E+00     1.800E+02     1.000E+00     0.000E+00     0.000E+00     9.528E-01       0.0E+00    ...
...
...
        40     -6.659E+00    -4.450E+00     8.280E+00     1.420E+02     5.156E-01     7.124E-01    -4.760E-01       0.0E+00       0.0E+00   ...
END
END-FILE

which is just the format used by the ENERGY-SCAN module, so this output can be read using that module. But you will need to fill up the molecule names and units used.

ORIENT jobs using batch_ORIENT.pl