CamCASP/Programming/9

From CUC3
Jump to navigation Jump to search

CamCASP => Programming => Coulomb Integrals

Integral calculation efficiency

We calculate 2 and 3-centre Coulomb integrals using gamint.F taken from the GAMESS(US) code and modified to work as a stand-alone integral package by Wojtek Cencek. For many years I have been complaining that this code is too slow and have been puzzled by timings reported by Rafal Podeszwa who uses the same code in the SAPT suite, but with integrals taking a small fraction of the time for a typical job. The problem is even worse when I use CamCASP in an ENERGY-SCAN calculation as here the density-fitting step is repeated a large number of times (thousands) while DALTON is used only once. So the large amount of time spent in integral calculation dominates the total calculation (around 95% of the time is spent here).

So what's the problem? Well, I got some timings from Rafal (for coronene) and they looked really very good and an order of magnitude better than anything I have seen. So how could this be explained? While I have modified gamint.F, the changes were minor and could not have resulted such an extreme slow-down. After inserting a lot of timing statements in the relevant bits of CamCASP, I was able to pinpoint the problem, solve it, and get a 10 times speed-up of the integral code in CamCASP.

The problem lay in subroutine my_gamint in gaming_frontend.F90 which is a wrapper for the call to gamint. This wrapper makes the call and unpacks the integrals using the index arrays supplied by gamint. I had initialized a few arrays to 0. They looked innocent. But the initialization was the cause for the slow-down. Here's the relevant bit of code:

  !Initializations
! ghondo = 0.0_dp
! ddij = 0.0_dp
  !
  call gamint(l_i,l_j,l_k,l_l,&
       &      iexch,qq4x,done,ghondo,ddij,pople,&
       &      iandj,kandl,same,nofi,nofj,nofk,nofl,&
       &      ijind,klind,nbai,nbaj,nbak,nbal)
  !
  num = 0
! integrals = 0.0_dp
! iindx = 0
! jindx = 0
! kindx = 0
! lindx = 0
  if (.not.done) return
  ij = 0
  jmax = nofj
  do iii = 1, nofi
    if (iandj) jmax = iii   !i.e., lower triangle only
    do jjj = 1, jmax
      ij = ij + 1
      kl = 0
      lmax = nofl
      do kkk = 1, nofk
        if (kandl) lmax = kkk  !i.e., lower triangle only
        do lll = 1, lmax
          kl = kl + 1
          nn = ijind(ij) + klind(kl)
          num = num + 1
          integrals(num) = ghondo(nn)
          iindx(num) = nbai(iii)
          jindx(num) = nbaj(jjj)
          kindx(num) = nbak(kkk)
          lindx(num) = nbal(lll)
        enddo
      enddo
    enddo
  enddo

I have commented out the offending code. The initialization were not needed. The speed-up is enormous! Here are some timings:

System: Anthrapyralene (C20H12, aDZ basis) Run on Tati, pgf90:

New code:
 make_T_AO_mono                           1                6.58

Old code:
 make_T_AO_mono                           1               65.06

This speed-up occurs for all systems I've looked at.

Why does it happen? The arrays are not large:

parameters.F90:
integer, parameter :: par_max_l = 5  !max number of symmetries for integrals
                                     !max ang mom is l=4, thus max_l = 4 + 1
                                     !(for s-functions)
                                     !NOTE: All basis sets must be defined to be
                                     !have max_l = par_max_l even if they do not
                                     !contain symmetries up to G.
integer, parameter :: par_max_rank = par_max_l - 1
!Maximum number of Cartesian components corresponding to the maximum rank:
integer, parameter :: par_max_cart_comp = (par_max_rank+1)*(par_max_rank+2)/2

module integral_parameters:
integer, parameter :: maxang = par_max_cart_comp

module tmp_gamint_array:
use integral_parameters, only : maxang
implicit none
logical :: done
integer, parameter :: maxg = maxang**4
real(dp), dimension(maxg) :: integrals
integer, dimension(maxg) :: iindx,jindx,kindx,lindx

module gamint_frontend
integer, parameter :: maxg = maxang**4
integer, parameter :: mxgsh = 30
integer, parameter :: dim_ddij = 16*mxgsh*mxgsh
integer, parameter :: maxang2 = maxang*maxang
...
  real(dp), dimension(maxg) :: ghondo
  real(dp), dimension(dim_ddij) :: ddij

So we have

maxang   = 15
maxg     = 50625
dim_ddij = 14400

So these array have sizes:

ghondo      396 KB
ddij        113 KB
integrals   396 KB
iindx x 4   791 KB
==================
           1696 KB

This is not a lot. True, the initialization is done 10s or 100s of millions of times, but why does it take 10 times the evaluation of the very complex integral calculation??? Surely this is a simple operation in comparison! Well, I do not understand it.