Difference between revisions of "CamCASP/Bugs/9"
import>Am592 |
import>Am592 |
||
(4 intermediate revisions by the same user not shown) | |||
Line 326: | Line 326: | ||
# small differences in MOs (can it be possible given the excellent agreement in ALDA+CHF on the two machines?) |
# small differences in MOs (can it be possible given the excellent agreement in ALDA+CHF on the two machines?) |
||
# an instability in the LU decomposition used to solve the FDDS equations. |
# an instability in the LU decomposition used to solve the FDDS equations. |
||
+ | # IFORT-related numerical precision error in calculation of the Kernel Integrals. |
||
− | I think it must be the latter. |
+ | I think it must be one or both of the latter two. |
=Conclusions & Resolution= |
=Conclusions & Resolution= |
||
+ | Firstly, the errors we have encountered in second-order energies for systems like pyridine and XVI are not caused by any bug in CamCASP. The problem seems to lie with DALTON and particular choices of compilers. This is good news as it provides a clear way to get on with work: use different compilers! I suggest we include two of these pyridine examples as part of our test suite: one with the ALDA+CHF kernel and another with the ALDAX+CHF. |
||
+ | |||
+ | Secondly, from the example in the DALTON section above, we can conclude the following: |
||
# '''DALTON''': Compilers can do bad things to both DALTON: |
# '''DALTON''': Compilers can do bad things to both DALTON: |
||
## We see no problem with the IFORT 12.0.5 compiler. |
## We see no problem with the IFORT 12.0.5 compiler. |
||
Line 345: | Line 349: | ||
# ICC compiled DALTON (DFT code has C-code in it) seems to result in a significantly slower code compared with GCC. This was a real surprise and may need to be looked into more carefully. |
# ICC compiled DALTON (DFT code has C-code in it) seems to result in a significantly slower code compared with GCC. This was a real surprise and may need to be looked into more carefully. |
||
# Gfortran 4.6.1 (on mmp12) compiled CamCASP was able to run the pyridine dimer job successfully. So the file-record error has vanished or we have fixed the bug (I had fixed some undefined variables in record_handler.F90 a few weeks ago). But the code takes longer than IFORT on the Kernel integral calculations. May have to do with the large number of automatically generated arrays associated with that module. This could be fixed. |
# Gfortran 4.6.1 (on mmp12) compiled CamCASP was able to run the pyridine dimer job successfully. So the file-record error has vanished or we have fixed the bug (I had fixed some undefined variables in record_handler.F90 a few weeks ago). But the code takes longer than IFORT on the Kernel integral calculations. May have to do with the large number of automatically generated arrays associated with that module. This could be fixed. |
||
+ | |||
+ | =To Be Done= |
||
+ | * Include SVD as an option in calculating the DF-FDDS. |
||
+ | * Re-run CamCASP with gfortran. Check to see if we get less variation in second-order dispersion. Hard to know what to compare with. |
||
=Runs/tests in progress= |
=Runs/tests in progress= |
||
* Tati: ifort 11.x with ATLAS : Done |
* Tati: ifort 11.x with ATLAS : Done |
||
− | * Tati: gfortran with ATLAS (CamCASP still ifort) |
+ | * Tati: gfortran with ATLAS (CamCASP still ifort) : done |
Latest revision as of 16:41, 17 November 2011
CamCASP => Bugs => Errors in Second-order Energies
Background
Way back in 2010, while developing a potential for molecule XVI (part of the 2010 Blind test), we had encountered large errors in the second-order energies for dimers of this molecule. Sally had alerted us to this (she was developing the potential with my help). She had realised there was a problem as there were significant outliers in the total energies and dispersion energies when compared with our fits. Closer investigation revealed that the contributions to the dispersion energy at imaginary frequency were erratic, and in some cases, even changed sign (they should all be of one sign and decay as <math> 1/(w^2+w_0^2)</math>), where <math>w_0</math> is a characteristic frequency). At that point we brushed aside the problem as there wasn't time to investigate it. But we knew that it was related to the new propagator module that allows us to calculate the FDDS entirely within CamCASP using the ALDAX+CHF model.
Recently, I have encountered the problem yet again, this time with the pyridine dimer. As before, the calculations involve the new propagator module with the ALDAX+CHF model. Unfortunately, this seems to be a large-system problem. Calculations on smaller systems or large systems with smaller (less-diffuse) basis sets have not exhibited the problem. Nor has there been a problem with calculations using the ALDAX propagator.
Here I present a summary of my investigations with a few pyridine dimer systems.
Details
Unless stated, the calculations use the parameters defined here.
CamCASP:
trunk rev 22218
DALTON
Version 2 with patches from Rafal. Also includes changes suggested by Anthony to get it to compile with ifort. See CamCASP/Compilation for details. Tati: compiled with Goto2 library mmp12: compiled with ATLAS
Default run parameters:
Integral switch = 1 PBE0 ALDAX+CHF Standard integration grids
Machines used:
Tati: ifort Version 11.1 BLAS/LAPACK: Goto2 mmp12: ifort Version 12.0.5 BLAS/LAPACK: ATLAS 3.8.4 with LAPACK additions from Netlib Compiled with Gfortran 4.5 or 4.6 (probably 4.5) Gfortran: Versions 4.5.4 and 4.6.1
Example 1: Pyridine <math>D_{2h}</math> dimer
Click here for a generic CLUSTER file
Sadlej/MC+ PBE0/AC
Second-order energies
mmp12: ALDA+CHF : -1628.32 -2316.07 ALDAX+CHF: -1546.89 -2324.97 ALDAX : -1460.61 -2065.68 --------------------------------- tati*: ALDAX+CHF: -311.77 -1138.50 --------------------------------- * Original run: PBE0/AC Integral switch = 0 ifort 11.1 CamCASP: Compiled >= Aug 05 2011 Rev. 22175
- The calculations on mmp12 are sensible.
- E2disp: Very good agreement between ALDA+CHF and ALDAX+CHF (0.4%). Errors large for ALDAX, but this is to be expected as we have used PBE0 for MOs, so we must have 25% of CHF.
- E2ind: This time ALDAX+CHF is not a good approximation. It turns out that the total induction energy from ALDA+CHF and ALDAX+CHF is not too bad: -549.63 cm-1 from ALDA+CHF and -522.14 cm-1 from ALDAX+CHF. A -5% error with ALDAX+CHF. I had overlooked this in my tests of ALDAX+CHF.
- The results on Tati are clearly nonsense. Here are some details of that run:
DATADIR: CamCASP/tests/bugs/kernel-bugs/bug-1-aldx-chf/pyr2/pyr2-sadlej-mcP-original/ File: $DATADIR/OUT/pot1721-Min13-s0.95-aldax-chf.out INDUCTION CALCULATION Parameters same. Skipping DF for pyr_1 OV DF finished for pyr_1 E^{2}_{ind}(UC) = -1634.523269 CM-1 E^{2}_{ind}(UC)(A) = -817.260620 CM-1 E^{2}_{ind}(UC)(B) = -817.262648 CM-1 E^{2}_{ind} = -311.774657 CM-1 E^{2}_{ind}(A) = -194.440126 CM-1 E^{2}_{ind}(B) = -117.334532 CM-1 Density-fitted propagator was used. Propagator type : cks --------------------------------- E^{2}_{ind,exch} = 207.525965 CM-1 E^{2}_{ind,exch}(A) = 129.424747 CM-1 E^{2}_{ind,exch}(B) = 78.101219 CM-1 E^{2}_{ind,exch}(UC) = 1087.984745 CM-1 E^{2}_{ind,exch}(UC)(A) = 543.991362 CM-1 E^{2}_{ind,exch}(UC)(B) = 543.993383 CM-1 --------------------------------- --------------------------------- Quad points and XI at each point: i OMEGA XI --------------------------------- 1 0.006610 -0.903720E-02 2 0.036175 0.100213E+00 3 0.095447 -0.197757E-01 4 0.197644 -0.154010E-01 5 0.370417 -0.103565E-01 6 0.674915 -0.495012E-02 7 1.264899 -0.114417E-02 8 2.619245 -0.200620E-03 9 6.910886 -0.698310E-05 10 37.823762 -0.109340E-07 --------------------------------- E^{2}_{disp} = -1138.500450 CM-1 E^{2}_{disp}(UC) = -3550.241639 CM-1 Density-fitted propagator was used. Propagator type : cks --------------------------------- E^{2}_{disp,exch} = 180.486265 CM-1 E^{2}_{disp,exch}(UC) = 562.819148 CM-1 ---------------------------------
It is worth looking at the details of this calculation:
- XI(w) should be monotonic with w. It is not.
- The induction energy in the coupled and un-coupled approximations are typically similar for systems like pyridine with a moderately sized HOMO-LUMO gap, but they are dramatically off.
- Symmetry has broken. This is a <math>D_{2h}</math> symmetry dimer. E2ind(A) and E2ind(B) should be the same. The uncoupled energies are, but not the coupled.
PUZZLE 1: The last observation is puzzling. Both the coupled and uncoupled expressions for the induction involve virtual orbitals. why has only the coupled calculation broken symmetry?
First-order energies
Here are first-order energies from runs on the two machines. They should be identical. Same inputs/settings. Only compilers and libraries differ.
========================== E1elst E1exch -------------------------- mmp12 -2792.27 4812.59 Tati -2798.89 4947.78 ==========================
How do we understand this? Look at the DALTON DFT energies...
DALTON energies
MonA MonB -------------------------- mmp12 -248.0367 -248.0367 (yes, this is a symmetric config) tati -249.3680 -249.3680 (D2h symm so monomers are identical) ==========================
This is amazing. The DFT solutions on these two machines are really very different. And these results are reproducible. Possible causes:
- Basis too diffuse/large. Uses mid-bonds. Perhaps this leads to linear-dependencies and oddities in the MOs. But how does this explain PUZZLE 1?
- DALTON parameters (in the *.dal files) could be the cause of the problem. We should allow orbital-space truncation by reducing .CMOMAX and increasing .AO DELETE.
- Compile on Tati with the new version of ifort and the ATLAS library.
The first two points should have effected runs on both machines. Better try different compilers and libraries.
Cross-runs: MOs from Tati with CamCASP on mmp12
As a cross-check I ran CamCASP on mmp12 using MOs from Tati. If there was a problem with the MOs from Tati I should have obtained the same CamCASP results on mmp12 as on Tati. This is the case:
Results in /bug-1-aldx-chf/pyr2/pyr2-sadlej-mcP-original/ Two runs: 2GB and 4GB memory. E2ind E2disp ---------------------------- 2GB -307.57 -1050.56 4GB -307.57 -1050.56 ----------------------------
I ran this job with two memory settings to ensure that this was not a large-job-small-memory problem as we have encountered on some occasions.
Example 2: Pyridine symmetry-broken dimer
As mentioned above, the inconsistencies seen with DALTON could have arisen from linear dependencies in the basis functions. But I also considered the option that the problems could have arisen because of the high symmetry of the dimer. There were reasons for this:
- The runs that failed we those that involved high-symmetry dimers. Either D2h or C2v.
- Both were run on Tati.
- mmp12 had only asymmetric dimers.
- There have been cases (Ar, Cl-) where DALTON has produced odd results (Ar was fine with NWChem).
As will be seen, the problem probably has nothing to do with symmetry.
The asymmetric dimer was a modification of the D2h structure used above. Here is the important bit from the CLUSTER file:
! Destroying D2h symm by changing pyr_1 angle from 87.662 to... Rotate pyr_1 by 80.662 about 0.903185 0.406195 0.138788 Place pyr_1 at 3.05962985 0.49181025 -3.1654741 Rotate pyr_2 by 102.564 about -0.924575 -0.123188 0.360537 Place pyr_2 at 1.23588445 0.07067905 6.99504475
Sadlej MC+ PBE0/noAC
Summary of results:
PBE0/noAC Sadlej/MC+ Integral switch = 1 ========================================================== Tati mmp12 --------------------- --------------------- ALDA+CHF ALDAX+CHF ALDA+CHF ALDAX+CHF Case (T1) (T2) (M1) (M2) ========================================================== Units: a.u. E_A -249.3678 -248.0366 Gap_A 0.2394 0.2410 NucRep_A 207.1565 207.1565 E_B -249.3679 -248.0367 Gap_B 0.2394 0.2410 NucRep_B 207.1565 207.1565 ---------------------------------------------------------- Units: cm-1 E1elst -2765.39 -2765.39 -2792.00 -2792.00 E1exch 5010.05 5010.05 4966.78 4966.78 E2ind,pol -1663.49 -2443.05 -1699.21 -1668.20 E2ind,ex 1121.74 1642.95 1165.83 1144.55 E2disp,pol -2361.21 -2437.08 -2328.53 -2327.98 E2disp,ex 377.82 389.96 375.12 375.03 ---------------------------------------------------------- E2int -280.48 -602.56 -312.02 -301.81 %diff +10.1% -93.1% 0.0% +3.3% ==========================================================
- DFT energies: The differences are very disturbing. The nuclear
repulsion energies are identical - as they should be for a symmetric dimer. But DFT energies differ by more than 1.3 a.u. This is absurdly large. There certainly is a problem with DALTON. Possibly compiler-induced. Needs to be re-compiled with other compilers.
- ALDA+CHF: The differences in T1 and M1 are significant, but
not excessive. These could well be due to the differences in the DALTON DFT calculations.
PUZZLE 2: This is puzzle 2. If, as I stated above, there is some sort of numerical instability in DALTON that leads to a corruption of the MOs, then why are ALDA+CHF results believable?
- ALDAX+CHF: Here's where we see a large difference in energies.
E2ind,pol in T2 is wrong. Have a look at the breakup from the T2 calculation:
E^{2}_{ind} = -2443.0533 CM-1 E^{2}_{ind}(A) = -1618.2230 CM-1 E^{2}_{ind}(B) = -824.83034 CM-1 E^{2}_{ind}(UC) = -1630.2294 CM-1 E^{2}_{ind}(UC)(A) = -810.98495 CM-1 E^{2}_{ind}(UC)(B) = -819.24447 CM-1 Compare this with M2: E^{2}_{ind} = -1668.2009 CM-1 E^{2}_{ind}(A) = -825.01692 CM-1 E^{2}_{ind}(B) = -843.18401 CM-1 E^{2}_{ind}(UC) = -1659.0578 CM-1 E^{2}_{ind}(UC)(A) = -824.93893 CM-1 E^{2}_{ind}(UC)(B) = -834.11883 CM-1
What could have caused the problem with E^{2}_{ind}(A) ?
Cross-check: MOs from Tati with CamCASP on mmp12
Here's the cross-check: MOs and Hessians from Tati with CamCASP on mmp12:
======================================= ALDA+CHF ALDAX+CHF Case (MT-1) (MT-2) --------------------------------------- Units: cm-1 E1elst -2765.39 -2765.39 E1exch 5010.05 5010.05 E2ind,pol -1663.49 -2116.15 A -823.85 -1291.32 B -839.63 -824.82 E2ind,ex 1121.74 1424.32 E2disp,pol -2361.21 -2408.50 E2disp,ex 377.82 385.38 =======================================
Observations:
- Run MT-1: (CamCASP on mmp12 using orbitals/hessians from Tati)
This run should give the same results as T1 and, it does. MT-1 agrees perfectly with T1.
- Run MT-2: This should give the same results as T2.
The first-order energies agree with those in T2 but not the second-order ones. E^{2}_{ind}(A) is quite different: -1618.22 in T2 compared with -1291.32 in MT-2. How do we explain this?
PUZZLE 3 The last observation is the third puzzle. I feel that this puzzle is related to the first. Sure, the MOs from DALTON are odd in an as-yet unknown manner. But the fact that CamCASP doesn't give the same results could mean
- Numerical instability in CamCASP. Probably while solving the FDDS equations.
- Problem associated with ifort 11.x
- Problem associated with Goto2
DALTON
Is the problem with DALTON? If so, is it a compiler/library problem?
Example 2
The broken symmetry pyridine dimer.
Sadlej/MC+ PBE0/noAC
============================================================================= Tati : gfortran 4.4.5 mmp12 : ifort 12.0.5 Mixed : MOs Tati ATLAS 3.8.2 ATLAS 3.8.4 CamCASP mmp12 ============================================================================= Units: a.u. E_A -248.0366 -248.0366 Gap_A 0.2410 0.2410 NucRep_A 207.1565 207.1565 E_B -248.0367 -248.0367 Gap_B 0.2410 0.2410 NucRep_B 207.1565 207.1565 ----------------------------------------------------------------------------- Units: cm-1 (mixed) ALDA+CHF ALDAX+CHF ALDA+CHF ALDAX+CHF ALDAX+CHF 1 2 3 4 5 --------------------- --------------------- ------------- E1elst -2792.01 -2792.01 -2792.01 -2792.00 -2792.01 E1exch 4966.79 4966.79 4966.79 4966.78 4966.79 E2ind,pol -1699.22 -1668.23 -1699.22 -1668.20 -1668.20 E2ind,ex 1165.83 1144.58 1165.83 1144.55 1144.56 E2disp,pol -2328.53 -2317.36 -2328.53 -2327.98 -2328.83 E2disp,ex 375.12 373.32 375.12 375.03 375.17 ----------------------------------------------------------------------------- E2int -312.03 -292.92 -312.03 -301.81 -302.53 %diff 0.0% 6.1% 0.0% 3.3% 3.0% 0.2% from col 4. =============================================================================
- DFT : Tati(gfortran/ATLAS) == mmp12(ifort12.x/ATLAS)
- ALDA+CHF : Tati(gfortran/ATLAS) == mmp12(ifort12.x/ATLAS)
- ALDAX+CHF: Tati ~ mmp12. There are differences in E2disp,pol. This needs to be resolved.
- ALDAX+CHF: MIXED: MOs from Tati used in CamCASP on mmp12. Much better agreement with Tati and mmp12. Difference only 0.2%. It still needs to be explained, but it is possible that the residual difference is due to
- small differences in MOs (can it be possible given the excellent agreement in ALDA+CHF on the two machines?)
- an instability in the LU decomposition used to solve the FDDS equations.
- IFORT-related numerical precision error in calculation of the Kernel Integrals.
I think it must be one or both of the latter two.
Conclusions & Resolution
Firstly, the errors we have encountered in second-order energies for systems like pyridine and XVI are not caused by any bug in CamCASP. The problem seems to lie with DALTON and particular choices of compilers. This is good news as it provides a clear way to get on with work: use different compilers! I suggest we include two of these pyridine examples as part of our test suite: one with the ALDA+CHF kernel and another with the ALDAX+CHF.
Secondly, from the example in the DALTON section above, we can conclude the following:
- DALTON: Compilers can do bad things to both DALTON:
- We see no problem with the IFORT 12.0.5 compiler.
- Gfortran 4.4.5 also seems OK.
- MOs with these two compilers agree to a number of significant figures.
- CamCASP: Less sensitive to the compiler used, but there is a suggestion of a numerical instability in the DF-FDDS module that could have something to do with LU decomposition. This needs to be investigated.
Compilers we should not use:
- IFORT 11.x
BLAS/LAPACK:
- Both ATLAS and Goto2 give the same results. This is encouraging.
Misc:
- ICC compiled DALTON (DFT code has C-code in it) seems to result in a significantly slower code compared with GCC. This was a real surprise and may need to be looked into more carefully.
- Gfortran 4.6.1 (on mmp12) compiled CamCASP was able to run the pyridine dimer job successfully. So the file-record error has vanished or we have fixed the bug (I had fixed some undefined variables in record_handler.F90 a few weeks ago). But the code takes longer than IFORT on the Kernel integral calculations. May have to do with the large number of automatically generated arrays associated with that module. This could be fixed.
To Be Done
- Include SVD as an option in calculating the DF-FDDS.
- Re-run CamCASP with gfortran. Check to see if we get less variation in second-order dispersion. Hard to know what to compare with.
Runs/tests in progress
- Tati: ifort 11.x with ATLAS : Done
- Tati: gfortran with ATLAS (CamCASP still ifort) : done