CamCASP/Programming/9
CamCASP => Programming => Coulomb Integrals
We calculate 2 and 3-centre Coulomb integrals using gamint.F taken from the GAMESS(US) code and modified to work as a stand-alone integral package by Wojtek Cencek. For many years I have been complaining that this code is too slow and have been puzzled by timings reported by Rafal Podeszwa who uses the same code in the SAPT suite, but with integrals taking a small fraction of the time for a typical job. The problem is even worse when I use CamCASP in an ENERGY-SCAN calculation as here the density-fitting step is repeated a large number of times (thousands) while DALTON is used only once. So the large amount of time spent in integral calculation dominates the total calculation (around 95% of the time is spent here).
So what's the problem? Well, I got some timings from Rafal (for coronene) and they looked really very good and an order of magnitude better than anything I have seen. So how could this be explained? While I have modified gamint.F, the changes were minor and could not have resulted such an extreme slow-down. After inserting a lot of timing statements in the relevant bits of CamCASP, I was able to pinpoint the problem, solve it, and get a 10 times speed-up of the integral code in CamCASP.
The problem lay in subroutine my_gamint in gaming_frontend.F90 which is a wrapper for the call to gamint. This wrapper makes the call and unpacks the integrals using the index arrays supplied by gamint. I had initialized a few arrays to 0. They looked innocent. But the initialization was the cause for the slow-down. Here's the relevant bit of code:
!Initializations ! ghondo = 0.0_dp ! ddij = 0.0_dp ! call gamint(l_i,l_j,l_k,l_l,& & iexch,qq4x,done,ghondo,ddij,pople,& & iandj,kandl,same,nofi,nofj,nofk,nofl,& & ijind,klind,nbai,nbaj,nbak,nbal) ! num = 0 ! integrals = 0.0_dp ! iindx = 0 ! jindx = 0 ! kindx = 0 ! lindx = 0 if (.not.done) return ij = 0 jmax = nofj do iii = 1, nofi if (iandj) jmax = iii !i.e., lower triangle only do jjj = 1, jmax ij = ij + 1 kl = 0 lmax = nofl do kkk = 1, nofk if (kandl) lmax = kkk !i.e., lower triangle only do lll = 1, lmax kl = kl + 1 nn = ijind(ij) + klind(kl) num = num + 1 integrals(num) = ghondo(nn) iindx(num) = nbai(iii) jindx(num) = nbaj(jjj) kindx(num) = nbak(kkk) lindx(num) = nbal(lll) enddo enddo enddo enddo
I have commented out the offending code. The initialization were not needed. The speed-up is enormous! Here are some timings:
System: Anthrapyralene (C20H12, aDZ basis) Run on Tati, pgf90:
New code: make_T_AO_mono 1 6.58 Old code: make_T_AO_mono 1 65.06
This speed-up occurs for all systems I've looked at.
Why does it happen? The arrays are not large:
parameters.F90: integer, parameter :: par_max_l = 5 !max number of symmetries for integrals !max ang mom is l=4, thus max_l = 4 + 1 !(for s-functions) !NOTE: All basis sets must be defined to be !have max_l = par_max_l even if they do not !contain symmetries up to G. integer, parameter :: par_max_rank = par_max_l - 1 !Maximum number of Cartesian components corresponding to the maximum rank: integer, parameter :: par_max_cart_comp = (par_max_rank+1)*(par_max_rank+2)/2 module integral_parameters: integer, parameter :: maxang = par_max_cart_comp module tmp_gamint_array: use integral_parameters, only : maxang implicit none logical :: done integer, parameter :: maxg = maxang**4 real(dp), dimension(maxg) :: integrals integer, dimension(maxg) :: iindx,jindx,kindx,lindx module gamint_frontend integer, parameter :: maxg = maxang**4 integer, parameter :: mxgsh = 30 integer, parameter :: dim_ddij = 16*mxgsh*mxgsh integer, parameter :: maxang2 = maxang*maxang ... real(dp), dimension(maxg) :: ghondo real(dp), dimension(dim_ddij) :: ddij