CamCASP/Bugs/9

From CUC3
Revision as of 16:41, 17 November 2011 by import>Am592 (→‎Conclusions & Resolution)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

CamCASP => Bugs => Errors in Second-order Energies

Background

Way back in 2010, while developing a potential for molecule XVI (part of the 2010 Blind test), we had encountered large errors in the second-order energies for dimers of this molecule. Sally had alerted us to this (she was developing the potential with my help). She had realised there was a problem as there were significant outliers in the total energies and dispersion energies when compared with our fits. Closer investigation revealed that the contributions to the dispersion energy at imaginary frequency were erratic, and in some cases, even changed sign (they should all be of one sign and decay as <math> 1/(w^2+w_0^2)</math>), where <math>w_0</math> is a characteristic frequency). At that point we brushed aside the problem as there wasn't time to investigate it. But we knew that it was related to the new propagator module that allows us to calculate the FDDS entirely within CamCASP using the ALDAX+CHF model.

Recently, I have encountered the problem yet again, this time with the pyridine dimer. As before, the calculations involve the new propagator module with the ALDAX+CHF model. Unfortunately, this seems to be a large-system problem. Calculations on smaller systems or large systems with smaller (less-diffuse) basis sets have not exhibited the problem. Nor has there been a problem with calculations using the ALDAX propagator.

Here I present a summary of my investigations with a few pyridine dimer systems.

Details

Unless stated, the calculations use the parameters defined here.

CamCASP:

 trunk  rev 22218
 

DALTON

 Version 2 with patches from Rafal. 
 Also includes changes suggested by Anthony to get it to compile with ifort. 
 See CamCASP/Compilation  for details.
 Tati: compiled with Goto2 library
 mmp12: compiled with ATLAS

Default run parameters:

 Integral switch = 1
 PBE0
 ALDAX+CHF
 Standard integration grids

Machines used:

 Tati:
    ifort Version 11.1
    BLAS/LAPACK: Goto2 
 mmp12:
    ifort Version 12.0.5
    BLAS/LAPACK: ATLAS 3.8.4 with LAPACK additions from Netlib
       Compiled with Gfortran 4.5 or 4.6 (probably 4.5)
    Gfortran: Versions 4.5.4 and 4.6.1

Example 1: Pyridine <math>D_{2h}</math> dimer

Pyr2-double-Hbonded.png

Click here for a generic CLUSTER file


Sadlej/MC+ PBE0/AC

Second-order energies

mmp12:
     ALDA+CHF :  -1628.32   -2316.07
     ALDAX+CHF:  -1546.89   -2324.97
     ALDAX    :  -1460.61   -2065.68
    ---------------------------------
tati*:
     ALDAX+CHF:  -311.77    -1138.50
    ---------------------------------
    * Original run: PBE0/AC Integral switch = 0 ifort 11.1
      CamCASP: Compiled >= Aug 05 2011  Rev. 22175
  • The calculations on mmp12 are sensible.
    • E2disp: Very good agreement between ALDA+CHF and ALDAX+CHF (0.4%). Errors large for ALDAX, but this is to be expected as we have used PBE0 for MOs, so we must have 25% of CHF.
    • E2ind: This time ALDAX+CHF is not a good approximation. It turns out that the total induction energy from ALDA+CHF and ALDAX+CHF is not too bad: -549.63 cm-1 from ALDA+CHF and -522.14 cm-1 from ALDAX+CHF. A -5% error with ALDAX+CHF. I had overlooked this in my tests of ALDAX+CHF.
  • The results on Tati are clearly nonsense. Here are some details of that run:
DATADIR: CamCASP/tests/bugs/kernel-bugs/bug-1-aldx-chf/pyr2/pyr2-sadlej-mcP-original/

File: $DATADIR/OUT/pot1721-Min13-s0.95-aldax-chf.out

  INDUCTION CALCULATION

  Parameters same. Skipping DF for pyr_1
OV DF finished for pyr_1
 E^{2}_{ind}(UC) =     -1634.523269 CM-1
     E^{2}_{ind}(UC)(A) =      -817.260620 CM-1
     E^{2}_{ind}(UC)(B) =      -817.262648 CM-1
 E^{2}_{ind} =      -311.774657 CM-1
     E^{2}_{ind}(A) =      -194.440126 CM-1
     E^{2}_{ind}(B) =      -117.334532 CM-1
  Density-fitted propagator was used.
  Propagator type : cks
 ---------------------------------

 E^{2}_{ind,exch} =       207.525965 CM-1
     E^{2}_{ind,exch}(A) =       129.424747 CM-1
     E^{2}_{ind,exch}(B) =        78.101219 CM-1
 E^{2}_{ind,exch}(UC) =      1087.984745 CM-1
     E^{2}_{ind,exch}(UC)(A) =       543.991362 CM-1
     E^{2}_{ind,exch}(UC)(B) =       543.993383 CM-1
 ---------------------------------

 ---------------------------------
 Quad points and XI at each point:
 i     OMEGA             XI
 ---------------------------------
    1      0.006610   -0.903720E-02
    2      0.036175    0.100213E+00
    3      0.095447   -0.197757E-01
    4      0.197644   -0.154010E-01
    5      0.370417   -0.103565E-01
    6      0.674915   -0.495012E-02
    7      1.264899   -0.114417E-02
    8      2.619245   -0.200620E-03
    9      6.910886   -0.698310E-05
   10     37.823762   -0.109340E-07
 ---------------------------------
 E^{2}_{disp} =     -1138.500450 CM-1
 E^{2}_{disp}(UC) =     -3550.241639 CM-1
     Density-fitted propagator was used.
     Propagator type : cks
 ---------------------------------

 E^{2}_{disp,exch} =       180.486265 CM-1
 E^{2}_{disp,exch}(UC) =       562.819148 CM-1
 ---------------------------------

It is worth looking at the details of this calculation:

  • XI(w) should be monotonic with w. It is not.
  • The induction energy in the coupled and un-coupled approximations are typically similar for systems like pyridine with a moderately sized HOMO-LUMO gap, but they are dramatically off.
  • Symmetry has broken. This is a <math>D_{2h}</math> symmetry dimer. E2ind(A) and E2ind(B) should be the same. The uncoupled energies are, but not the coupled.

PUZZLE 1: The last observation is puzzling. Both the coupled and uncoupled expressions for the induction involve virtual orbitals. why has only the coupled calculation broken symmetry?

First-order energies

Here are first-order energies from runs on the two machines. They should be identical. Same inputs/settings. Only compilers and libraries differ.

          ==========================
                   E1elst    E1exch
          --------------------------
          mmp12   -2792.27   4812.59
          Tati    -2798.89   4947.78
          ==========================

How do we understand this? Look at the DALTON DFT energies...

DALTON energies

                  MonA      MonB
          --------------------------
          mmp12  -248.0367  -248.0367 (yes, this is a symmetric config)
          tati   -249.3680  -249.3680 (D2h symm so monomers are identical)
          ==========================

This is amazing. The DFT solutions on these two machines are really very different. And these results are reproducible. Possible causes:

  • Basis too diffuse/large. Uses mid-bonds. Perhaps this leads to linear-dependencies and oddities in the MOs. But how does this explain PUZZLE 1?
  • DALTON parameters (in the *.dal files) could be the cause of the problem. We should allow orbital-space truncation by reducing .CMOMAX and increasing .AO DELETE.
  • Compile on Tati with the new version of ifort and the ATLAS library.

The first two points should have effected runs on both machines. Better try different compilers and libraries.

Cross-runs: MOs from Tati with CamCASP on mmp12

As a cross-check I ran CamCASP on mmp12 using MOs from Tati. If there was a problem with the MOs from Tati I should have obtained the same CamCASP results on mmp12 as on Tati. This is the case:

Results in
           /bug-1-aldx-chf/pyr2/pyr2-sadlej-mcP-original/
          Two runs: 2GB and 4GB memory.
                E2ind    E2disp
          ----------------------------
          2GB   -307.57  -1050.56
          4GB   -307.57  -1050.56
          ----------------------------

I ran this job with two memory settings to ensure that this was not a large-job-small-memory problem as we have encountered on some occasions.

Example 2: Pyridine symmetry-broken dimer

As mentioned above, the inconsistencies seen with DALTON could have arisen from linear dependencies in the basis functions. But I also considered the option that the problems could have arisen because of the high symmetry of the dimer. There were reasons for this:

  • The runs that failed we those that involved high-symmetry dimers. Either D2h or C2v.
  • Both were run on Tati.
  • mmp12 had only asymmetric dimers.
  • There have been cases (Ar, Cl-) where DALTON has produced odd results (Ar was fine with NWChem).

As will be seen, the problem probably has nothing to do with symmetry.

The asymmetric dimer was a modification of the D2h structure used above. Here is the important bit from the CLUSTER file:

! Destroying D2h symm by changing pyr_1 angle from 87.662 to...
Rotate pyr_1 by 80.662 about 0.903185 0.406195 0.138788
Place pyr_1 at 3.05962985 0.49181025 -3.1654741
Rotate pyr_2 by 102.564 about -0.924575 -0.123188 0.360537
Place pyr_2 at 1.23588445 0.07067905 6.99504475

Sadlej MC+ PBE0/noAC

Summary of results:

        PBE0/noAC Sadlej/MC+  Integral switch = 1
        ==========================================================
                              Tati                mmp12
                     ---------------------   ---------------------
                     ALDA+CHF    ALDAX+CHF   ALDA+CHF    ALDAX+CHF
        Case            (T1)       (T2)        (M1)        (M2)
        ==========================================================
        Units: a.u.
        E_A               -249.3678               -248.0366
          Gap_A              0.2394                  0.2410
          NucRep_A         207.1565                207.1565
        E_B               -249.3679               -248.0367
          Gap_B              0.2394                  0.2410
          NucRep_B         207.1565                207.1565
        ----------------------------------------------------------
        Units: cm-1
        E1elst      -2765.39    -2765.39    -2792.00   -2792.00
        E1exch       5010.05     5010.05     4966.78    4966.78
        E2ind,pol   -1663.49    -2443.05    -1699.21   -1668.20
        E2ind,ex     1121.74     1642.95     1165.83    1144.55
        E2disp,pol  -2361.21    -2437.08    -2328.53   -2327.98
        E2disp,ex     377.82      389.96      375.12     375.03
        ----------------------------------------------------------
        E2int        -280.48     -602.56     -312.02    -301.81
        %diff         +10.1%      -93.1%        0.0%      +3.3%
        ==========================================================

  • DFT energies: The differences are very disturbing. The nuclear

repulsion energies are identical - as they should be for a symmetric dimer. But DFT energies differ by more than 1.3 a.u. This is absurdly large. There certainly is a problem with DALTON. Possibly compiler-induced. Needs to be re-compiled with other compilers.

  • ALDA+CHF: The differences in T1 and M1 are significant, but

not excessive. These could well be due to the differences in the DALTON DFT calculations.

PUZZLE 2: This is puzzle 2. If, as I stated above, there is some sort of numerical instability in DALTON that leads to a corruption of the MOs, then why are ALDA+CHF results believable?

  • ALDAX+CHF: Here's where we see a large difference in energies.

E2ind,pol in T2 is wrong. Have a look at the breakup from the T2 calculation:

         E^{2}_{ind} =   -2443.0533     CM-1
             E^{2}_{ind}(A) =   -1618.2230     CM-1
             E^{2}_{ind}(B) =   -824.83034     CM-1
         E^{2}_{ind}(UC) =   -1630.2294     CM-1
             E^{2}_{ind}(UC)(A) =   -810.98495     CM-1
             E^{2}_{ind}(UC)(B) =   -819.24447     CM-1

        Compare this with M2:
         E^{2}_{ind} =   -1668.2009     CM-1
             E^{2}_{ind}(A) =   -825.01692     CM-1
             E^{2}_{ind}(B) =   -843.18401     CM-1
         E^{2}_{ind}(UC) =   -1659.0578     CM-1
             E^{2}_{ind}(UC)(A) =   -824.93893     CM-1
             E^{2}_{ind}(UC)(B) =   -834.11883     CM-1

What could have caused the problem with E^{2}_{ind}(A) ?

Cross-check: MOs from Tati with CamCASP on mmp12

Here's the cross-check: MOs and Hessians from Tati with CamCASP on mmp12:

        =======================================
                      ALDA+CHF    ALDAX+CHF
        Case           (MT-1)     (MT-2)
        ---------------------------------------
        Units: cm-1
        E1elst        -2765.39    -2765.39
        E1exch         5010.05     5010.05
        E2ind,pol     -1663.49    -2116.15
          A            -823.85    -1291.32
          B            -839.63     -824.82
        E2ind,ex       1121.74     1424.32
        E2disp,pol    -2361.21    -2408.50
        E2disp,ex       377.82      385.38
        =======================================

Observations:

  • Run MT-1: (CamCASP on mmp12 using orbitals/hessians from Tati)

This run should give the same results as T1 and, it does. MT-1 agrees perfectly with T1.

  • Run MT-2: This should give the same results as T2.

The first-order energies agree with those in T2 but not the second-order ones. E^{2}_{ind}(A) is quite different: -1618.22 in T2 compared with -1291.32 in MT-2. How do we explain this?

PUZZLE 3 The last observation is the third puzzle. I feel that this puzzle is related to the first. Sure, the MOs from DALTON are odd in an as-yet unknown manner. But the fact that CamCASP doesn't give the same results could mean

  1. Numerical instability in CamCASP. Probably while solving the FDDS equations.
  2. Problem associated with ifort 11.x
  3. Problem associated with Goto2


DALTON

Is the problem with DALTON? If so, is it a compiler/library problem?

Example 2

The broken symmetry pyridine dimer.

Sadlej/MC+ PBE0/noAC

=============================================================================
             Tati : gfortran 4.4.5   mmp12 : ifort 12.0.5  Mixed : MOs Tati
                    ATLAS 3.8.2              ATLAS 3.8.4        CamCASP mmp12
=============================================================================
Units: a.u.
E_A               -248.0366               -248.0366
  Gap_A              0.2410                  0.2410
  NucRep_A         207.1565                207.1565
E_B               -248.0367               -248.0367
  Gap_B              0.2410                  0.2410
  NucRep_B         207.1565                207.1565
-----------------------------------------------------------------------------
Units: cm-1                                                    (mixed)
             ALDA+CHF   ALDAX+CHF    ALDA+CHF  ALDAX+CHF      ALDAX+CHF
                1          2            3          4              5
             ---------------------   ---------------------  -------------
E1elst      -2792.01    -2792.01    -2792.01   -2792.00       -2792.01
E1exch       4966.79     4966.79     4966.79    4966.78        4966.79
E2ind,pol   -1699.22    -1668.23    -1699.22   -1668.20       -1668.20
E2ind,ex     1165.83     1144.58     1165.83    1144.55        1144.56
E2disp,pol  -2328.53    -2317.36    -2328.53   -2327.98       -2328.83
E2disp,ex     375.12      373.32      375.12     375.03         375.17
-----------------------------------------------------------------------------
E2int        -312.03     -292.92     -312.03    -301.81        -302.53
%diff          0.0%        6.1%        0.0%       3.3%           3.0%
                                                                 0.2% from
                                                                 col 4.
=============================================================================
  • DFT : Tati(gfortran/ATLAS) == mmp12(ifort12.x/ATLAS)
  • ALDA+CHF : Tati(gfortran/ATLAS) == mmp12(ifort12.x/ATLAS)
  • ALDAX+CHF: Tati ~ mmp12. There are differences in E2disp,pol. This needs to be resolved.
  • ALDAX+CHF: MIXED: MOs from Tati used in CamCASP on mmp12. Much better agreement with Tati and mmp12. Difference only 0.2%. It still needs to be explained, but it is possible that the residual difference is due to
  1. small differences in MOs (can it be possible given the excellent agreement in ALDA+CHF on the two machines?)
  2. an instability in the LU decomposition used to solve the FDDS equations.
  3. IFORT-related numerical precision error in calculation of the Kernel Integrals.

I think it must be one or both of the latter two.

Conclusions & Resolution

Firstly, the errors we have encountered in second-order energies for systems like pyridine and XVI are not caused by any bug in CamCASP. The problem seems to lie with DALTON and particular choices of compilers. This is good news as it provides a clear way to get on with work: use different compilers! I suggest we include two of these pyridine examples as part of our test suite: one with the ALDA+CHF kernel and another with the ALDAX+CHF.

Secondly, from the example in the DALTON section above, we can conclude the following:

  1. DALTON: Compilers can do bad things to both DALTON:
    1. We see no problem with the IFORT 12.0.5 compiler.
    2. Gfortran 4.4.5 also seems OK.
    3. MOs with these two compilers agree to a number of significant figures.
  2. CamCASP: Less sensitive to the compiler used, but there is a suggestion of a numerical instability in the DF-FDDS module that could have something to do with LU decomposition. This needs to be investigated.

Compilers we should not use:

  1. IFORT 11.x

BLAS/LAPACK:

  1. Both ATLAS and Goto2 give the same results. This is encouraging.

Misc:

  1. ICC compiled DALTON (DFT code has C-code in it) seems to result in a significantly slower code compared with GCC. This was a real surprise and may need to be looked into more carefully.
  2. Gfortran 4.6.1 (on mmp12) compiled CamCASP was able to run the pyridine dimer job successfully. So the file-record error has vanished or we have fixed the bug (I had fixed some undefined variables in record_handler.F90 a few weeks ago). But the code takes longer than IFORT on the Kernel integral calculations. May have to do with the large number of automatically generated arrays associated with that module. This could be fixed.

To Be Done

  • Include SVD as an option in calculating the DF-FDDS.
  • Re-run CamCASP with gfortran. Check to see if we get less variation in second-order dispersion. Hard to know what to compare with.

Runs/tests in progress

  • Tati: ifort 11.x with ATLAS : Done
  • Tati: gfortran with ATLAS (CamCASP still ifort) : done