CamCASP/Compilation

From CUC3
Revision as of 14:05, 17 November 2011 by import>Am592
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

CamCASP => Compilation

Compilers to Avoid

Compilers that Seem OK

  • Gfortran 4.6.1 : Earlier versions caused a direct file access error in CamCASP when files were larger than a certain undetermined size.
  • Ifort 12.0.5 : Good code for both DALTON and CamCASP.

Libraries to Avoid

  • MKL: Causes memory leaks that effect the rest of CamCASP adversely.

Libraries that Seem OK

  • ATLAS : yet to find a flaw in this library. Recommended. Needs to be built with all LAPACK routines. See notes on the ATLAS page for details of how this is to be done.
  • Goto2 BLAS/LAPACK: No problems. Slightly faster than ATLAS.


Notes on compiling CamCASP

Ifort

  • CamCASP 5.5-dev (revision 21014) now compiles with ifort 11.1. This includes all codes in the distribution. CamCASP itself runs almost twice as fast as the corresponding binary from pgf90. I used the Goto2 Lapack/Blas. MKL should probably not be used - it was always unreliable.--alston 14:23, 13 July 2010 (BST)
    • ifort 11.1: Worked fine with water dimer (Sadlej MC+) but seg faulted with benzene dimer (sadlej MC+). -heap-arrays fixed the problem but neither Norbert nor I could figure out why. The seg fault occurred during an array copy operation in df_Smat.F90:
if (AB%contiguous_aux_A.and.AB%contiguous_aux_B) then
  !This is the algorithm for contiguous mappings.
  stA  = AB%MapAuxA2AB(1)
  endA = AB%MapAuxA2AB(nauxA)
  stB  = AB%MapAuxB2AB(1)
  endB = AB%MapAuxB2AB(nauxB)
  !
  if (fill_A_A) S_A_A%matrix(1:nauxA,1:nauxA) = S%matrix(stA:endA,stA:endA)

The last line resulted in the segfault. Both types were in memory and had the right dimensions. The copy could be done explicitly through loops and both arrays could be accessed and set to 0.0, but they S_A_A could not be filled using the whole-array operation shown above. Could this be a compiler problem? -check bounds showed nothing.--alston 16:00, 13 July 2010 (BST)

  • -heap-arrays 2048 seems to be a good choice. Timings for benzene dimer: (real time) 76mins where gfortran takes >110mins. This is a substantial reduction. User time is 68mins, so 8mins is spent in system time (disk!).--alston 11:04, 14 July 2010 (BST)
  • IMPORTANT: There are small differences in second-order induction and dispersion energies (pol) with ifort. The second-order exchange energies (UC) are the same as those with gfortran and pgf90. What could be different? Looks like the Propagator is not quite the same with ifort. But why? In all cases, the Goto2 library is used for solving the LR-DFT equations.--alston 11:04, 14 July 2010 (BST)

PGF90

By in large, this compiler works with CamCASP.

10.x

Cannot compile gamint.F and dma.F90 without reducing optimization level to -O2. With -O3 we get an assembler error message:

pgf90 -O3  -DPGF90 -DCADPAC -DGAMESS -DSAPT2002 -DSIGNED_INTEGER -Mpreprocess -fastsse  
           -mcmodel=medium -c /home/am592/CamCASP/5.5-dev/src/gamint.F -o gamint.o
/tmp/pgf904O4eOUrbnf_T.s: Assembler messages:
/tmp/pgf904O4eOUrbnf_T.s:9543: Error: no such instruction: `pmaxsd (%rcx),%xmm0'
/tmp/pgf904O4eOUrbnf_T.s:9545: Error: no such instruction: `pmaxsd 16(%rcx),%xmm0'
make[2]: *** [gamint.o] Error 2

A similar message is printed with dma.F90. I've seen this error with 10.1-4.

  • Update: The culprit seems to have been the -fastsse flag and not -O3. with -O2 (and no -fastsse) integrals are calculated significantly slower (by around 90%, compared with a gfortran binary).
  • Comparisons with various compiler versions (8.0.4 and the 10.x series) suggests that they are all produce binaries that are slower than a Gfortran optimized binary, especially for the integrals from gamint.F (though I am inferring this - we do not time calls to gamint.F due to their number!).
  • Curiously the kernel integrals are evaluated in roughly the same time (except for pgf90 8.0.4 which was significantly slower here) with gfortran and pgf90 (10.x) binaries.
  • Comparison of PGI 10.4 and Gfortran 4.4.4
Both linked to Goto BLAS
System: water dimer aTZ
                                         Gfortran        PGI 10.4        
 Subroutine          Number of Calls   Time (seconds)
   main_parser                  1              171.06        299.17                   
   matrix_write              5639               14.01         57.22                 
   matmult_types             5357               15.91         32.67                                     
   df_parser                    4               72.85        162.73                   
   df_dimer                    21               33.95         80.38                  
   df_int_for_df              132               70.97        160.68                                 
   make_kernel_integral        22               77.01         70.44  

The largest gains are in disk I/O and floating point operations (df integrals). Curiously, Gfortran is 10% slower for the kernel integrals.

Gfortran

There seem to be problems with gfortran versions earlier than 4.3. I think it is because the preprocessor is not automatically used (even if -cpp is used, because only with version 4.4 is this option recognised). It may be possible to get around this, but for now, I've found it easier to download my own copy of GCC 4.4. It may still have bugs (at one stage it couldn't find include files when compiling Dalton) but the current version (20090219) seems OK.

GCC 4.4 will reside in its own directory. Do not install it in with the other GCC compilers as it is still under development. Because it will reside in a non-standard location, you need to set some environment scripts to get the compilation going. I use the following scripts.

1. gcc_4.4.env

#!/bin/bash

gcc44="tmp/gcc-trunk"
export LD_RUN_PATH=${HOME}/$gcc44/lib64
echo $LD_RUN_PATH
export LD_LIBRARY_PATH=${HOME}/$gcc44/lib64:$LD_LIBRARY_PATH
export PATH=${HOME}/$gcc44/bin:$PATH

2. makeall

#!/bin/bash
source ./gcc_4.4.env
echo $LD_RUN_PATH
echo $LD_LIBRARY_PATH
gfortran -v

mymake='make COMPILER=gfortran MACHINE=tati '

${mymake} clean

${mymake} all

exit

These are really simple scripts. After compilation (when makeall exits) the default GCC flags will be reset, so your experimental version of GCC doesn't affect anything else.

Users should be aware that GCC and GFortran are under active development, and that incompatible changes sometimes occur. In particular, internal changes have sometimes led to linking problems, which could only be resolved by recompiling the ATLAS/lapack library with the latest version of the compiler.

Issues

  • Newer versions of gcc (4.4.x onwards - I am not sure, perhaps even 4.3.x) result in a problem writing direct-access files with more than 16384 records. The exact number seems to fluctuate a little, but it is very close to <math>2^{14}</math>. The file gets mangled on either write or read. No other compiler seems to exhibit the problem, and nor did gfortran in mid-2009 when I used an experimental version of gfortran 4.4 or 4.3.

The problem is quite possibly in record_handler.F90: A simple code to write very large direct-access files works without a problem. Here is a copy of the code.

Compiling on Mac OS X

1. Install Fink and FinkCommander. (See http://finkcommander.sourceforge.net.) You can build all the programs from the command line -- no need for Xcode.

2. Make sure you get the latest version of gfortran, from http://gcc.gnu.org/wiki/GFortranBinaries.

3. You need the full ATLAS lapack library.

4. The CamCASP programs should compile without any problems using the scripts above. Make sure that the -static flag is not specified, though.

5. See below for information on compiling Dalton.

6. Building SAPT is troublesome, because file-names in Mac OS X are case-insensitive, so the shell script called SAPT in the SAPT bin directory gets confused with the SAPT executable called sapt. This can be worked around by renaming the SAPT script, e.g. as SAPT.sh, and commenting out the lines (around line 348) in Compall that change the first line of SAPT. (It doesn't need to be changed for Mac OS.) This must be done before attempting to run Compall -- otherwise the SAPT script will get overwritten by the sapt binary. Line 329 of Compall also needs to be changed to provide for the gfortran option. The current version of CamCASP doesn't use either SAPT or sapt unless 3rd-order energies are wanted.

7. Building Orient is fairly straightforward except for the link options.

  • Omit the -static option.
  • The ATLAS libraries need to be invoked by giving the full pathname of each library file in the gfortran link step -- otherwise the Mac version of ATLAS is scanned, leaving many unresolved references. That is, instead of "-llapack", specify e.g. "/usr/local/ATLAS/OSX_CORE2SSE2/lib/liblapack.a", and similarly for cblas, f77blas and atlas.
  • The OpenGL libraries are provided as standard in Mac OS X. To include them, specify "-Wl,-framework,OpenGL -Wl,-framework,GLUT" in the link step.

Compiling SAPT2006

Gfortran

Here's the preamble of the Compall script I use on my Hapertown Quad-core Intel with Gfortran:

G94=NO
G03=NO #/home/patkowsk/gaussian/g03 # specify path to the G98/G03 directory
GAUEXE=NO  # change if G98 is used instead of G03
EXTRADEFS='' # if Gaussian was compiled with -DI64 and/or -DPACK64, place
             # these definitions here as well
             # TWOGIGAMAX and/or TPDRVN can also be put here
GAMESS=NO
VERNO=00      # specify for Gamess
CADPAC=NO
DALTON=/home/am592/DALTON/dalton-2.0-2006/bin/dalton
ACES=NO
# molpro interface works only with fortran-90 compilers
MOLPRO=NO
# DIIS in CC code
DIIS=YES
SAPTDFT=YES

######### TARGET system ##########################################
#
# Curently available: sgi, ibm32, ibm64, alpha, g77, g77_64 (AMD64), 
# g77_32 (AMD64 in 32bit mode),  pgf77 (on AMD64 32-bit only), 
# pgf90 (works on AMD64), ifort (on AMD64 32-bit only), sunf90, hpux.
# gfortran works only partially (not recommended)
##################################################################

TARGET=gfortran

########### Location of the blas and lapack library you want to use #####
#
# usually ' -llapack -lblas ', but sometimes different, e.g., ' -ldxml ' on alpha,
# or ' -L/scratch -llapack -lcblas -lf77blas -latlas ' on our Athlons with LAPACK
# or ' -xlic_lib=sunperf ' on our strauss (SPARC).
# or ' -lessl '            works on brainerd (-lblas does not work
#                                                    for some reason)
#
##############################################################
BLAS=' -L/home/am592/ATLAS/Linux_Intel64/lib/ -llapack -lcblas -lf77blas -latlas '

BUILDLAPACK=NO # set to NO if you provide your own Lapack

...
...

When using Gfortran you will encounter another problem: Compall tries to build ATMOL1024 by default (as it finds the directory SAPT2006/atmol1024). But there is no makefile for Gfortran located in SAPT2006/atmol1024/ so move SAPT2006/atmol1024 to SAPT2006/atmol1024_not_used and Compall will bypass it.

Compiling DALTON 2.0

Patching DALTON

Obtain the patch from SAPT2006. Our version of DALTON 2.0 includes the patches in the DALTON/patches/ sub-directory. There are instructions in the README file in that directory. But all you do is:

cd DALTON
patch -p0 < patches/patch-sapt-2006-1

Where patch-sapt-2006-1 happens to be the latest patch at the present time.

Gfortran & iFort

For Dalton, the configure script must be edited to allow for the use of gfortran.

  • At around line 1091, after the g77 section for darwin, insert
                elif [ "$F77" = gfortran ]; then
                   cpp="-DVAR_G77 $cpp"
                   mcpu=""
                   copt="$mcpu -O3 -ffast-math -fexpensive-optimizations -funroll-loops"
                   copt2="$mcpu -O2 -ffast-math -fexpensive-optimizations -funroll-loops"
                   opt="$copt2 -std=legacy"
                   copt="$copt -std=c99 -DRESTRICT=restrict"
                   def='linux.x'
                   inc='-I../include'
                   cpp=${cpp}" -DIMPLICIT_NONE"
    
  • At around line 1010 insert, after the pgf77 section for linux,
              gfortran)
                 # cpp="-DVAR_PGF77 $cpp"
                 copt="-O3 -ffast-math -fexpensive-optimizations -std=c99 -DRESTRICT=restrict"
                 opt="-O3 -ffast-math -funroll-loops"
                 safe_opt="-O3 -ffast-math -funroll-loops"
                 ;;
    
  • Change line 744 to 'tab=$($ECHO "\t")'.
  • In line 200 or thereabouts, add "gfortran" to the linux and darwin complists.

Some minor code changes are also needed.

  • In cc/cc_pckutil.F, line 100, change z'C0000000' to transfer(z'C0000000',1). A similar change is needed in lines 236 and 238.
  • In abacus/her2out.F, replace line 406 by
                PERCNT = real(100*N2WRIT,kind(PERCNT)) / real(NALL,kind(PERCNT))
    

Finally, the script bin/dalton may need to be edited to give the correct path to the executables. (Best written as, e.g., "$DALTON/bin/dalton.x".)

Portland

BLAS and LAPACK libraries

GOTO

I just experimented with the Goto libraries. A quick LU decomposition test showed Goto to be 15% faster than the ATLAS libraries. This is quite good. The library builds flawlessly (using gcc and gfortran) and even automatically downloads and builds LAPACK so you get it all, and very quickly too. Linking is straightforward, but you need to link to -lpthreads in addition to -lgoto2. For example:

-L/home/am592/ATLAS/GotoBLAS2/ -lgoto2 -lpthread

Since the Goto library is threaded, some of the observed speedup could well be due to the threading. So more careful tests are needed. In any case, this is something to look at seriously. --alston 16:48, 8 April 2010 (BST)