Computing values only once

From Docswiki
Jump to navigation Jump to search

Profiling

  • Compile your code with pgf90 and the following flags:
  FFLAGS= -Mextend -g -traceback -pg
  • Run GMIN on your input. Note that gmon.out is produced.
  • Run the profiler. The output is long, so put it to a file.
  gprof ~/location_of_GMIN_binary/GMIN > my_output_file

The first part of the file says how long the program spent in each subroutine and how many calls were made to each subroutine.

Speeding up small loops

FOR loops should only be used when the instructions cannot be explicitly written. For example, the identity matrix should be coded as

 I3(:,:) = 0.D0
 I3(1,1) = 1.D0; I3(2,2) = 1.D0; I3(3,3) = 1.D0

rather than

 I3(:,:) = 0.D0
 FOR I = 0, 3
   I3(I,I) = 1.D0
 ENDDO

You would hope that the compiler would do this type of optimization for you, but I saw a nice speedup in my case, so I wrote this out myself.

Computing values only once

Usually a potential subroutine has this form:

 FOR OUTER LOOP OVER MOLECULES
   FOR INNER LOOP OVER MOLECULES
     FOR OUTER LOOP OVER SITES
        FOR INNER LOOP OVER SITES
           Compute potential contributions
        ENDDO
     ENDDO
   ENDDO
 ENDDO

If there are N molecules with n sites each, then there are typically some N values associated with each molecule (e.g., rotation matrices), some Nn values associated with each site (e.g., position of site with respect to molecular origin), and some values associated with each pair of sites.

Because the code visits the "Compute contributions" area some times, it is much more efficient if the values associated with molecules or sites are calculated in a different loop:

 FOR LOOP OVER MOLECULES
   Calculate molecule values
 
   FOR LOOP OVER SITES
     Calculate site values
   ENDDO
 ENDDO
 
 FOR OUTER LOOP OVER MOLECULES
   etc
 ENDDO


Putting array dimensions in the right order

For historical reasons, Fortran puts its array dimensions in the opposite order from what we would expect. For example, a 3x3 matrix A has its components stored as a 1D array in the memory in order

A[0][0] A[1][0] A[2][0] A[0][1] A[1][1] A[2][1] A[0][2] A[1][2] A[2][2]

so the more efficient way to loop is to put the second component in the outermost loop:

for j=0,2
 for i=0,2
   compute a[i][j]
 enddo
enddo

The effect can apparently be 2-3x faster, but I don't think that memory access is the limiting factor in our programs. For example, I used an array of shape matrices, one 3x3 shape matrix per site per molecule. I originally coded this as A[molecule][site][row][col], and switching to A[row][col][site][mol] made my code 0.2% faster, so I didn't even bother changing it.