Adk44: Created page with "== Profiling == * Compile your code with pgf90 and the following flags: FFLAGS= -Mextend -g -traceback -pg * Run GMIN on your input. Note that gmon.out is produced. * Run..."

2019-05-13T12:29:20Z

Created page with "== Profiling == * Compile your code with pgf90 and the following flags: FFLAGS= -Mextend -g -traceback -pg * Run GMIN on your input. Note that gmon.out is produced. * Run..."

New page

== Profiling ==
* Compile your code with pgf90 and the following flags:

FFLAGS= -Mextend -g -traceback -pg

* Run GMIN on your input. Note that gmon.out is produced.
* Run the profiler. The output is long, so put it to a file.

gprof ~/location_of_GMIN_binary/GMIN > my_output_file

The first part of the file says how long the program spent in each subroutine and how many calls were made to each subroutine.

== Speeding up small loops ==

FOR loops should only be used when the instructions cannot be explicitly written. For example, the identity matrix should be coded as

I3(:,:) = 0.D0
I3(1,1) = 1.D0; I3(2,2) = 1.D0; I3(3,3) = 1.D0

rather than

I3(:,:) = 0.D0
FOR I = 0, 3
I3(I,I) = 1.D0
ENDDO

You would hope that the compiler would do this type of optimization for you, but I saw a nice speedup in my case, so I wrote this out myself.

== Computing values only once ==

Usually a potential subroutine has this form:

FOR OUTER LOOP OVER MOLECULES
FOR INNER LOOP OVER MOLECULES
FOR OUTER LOOP OVER SITES
FOR INNER LOOP OVER SITES
Compute potential contributions
ENDDO
ENDDO
ENDDO
ENDDO

If there are ''N'' molecules with ''n'' sites each, then there are typically some ''N'' values associated with each molecule (e.g., rotation matrices), some ''Nn'' values associated with each site (e.g., position of site with respect to molecular origin), and some <math>\tfrac{1}{2}(Nn)(Nn) \sim (Nn)^2</math> values associated with each pair of sites.

Because the code visits the "Compute contributions" area some <math>O(N^2n^2)</math> times, it is much more efficient if the values associated with molecules or sites are calculated in a different loop:

FOR LOOP OVER MOLECULES
Calculate molecule values

FOR LOOP OVER SITES
Calculate site values
ENDDO
ENDDO

FOR OUTER LOOP OVER MOLECULES
etc
ENDDO

== Putting array dimensions in the right order ==

For historical reasons, Fortran puts its array dimensions in the opposite order from what we would expect. For example, a 3x3 matrix A has its components stored as a 1D array in the memory in order

A[0][0] A[1][0] A[2][0] A[0][1] A[1][1] A[2][1] A[0][2] A[1][2] A[2][2]

so the more efficient way to loop is to put the second component in the outermost loop:

for j=0,2
for i=0,2
compute a[i][j]
enddo
enddo

The effect can apparently be 2-3x faster, but I don't think that memory access is the limiting factor in our programs. For example, I used an array of shape matrices, one 3x3 shape matrix per site per molecule. I originally coded this as A[molecule][site][row][col], and switching to A[row][col][site][mol] made my code 0.2% faster, so I didn't even bother changing it.

Optimization tricks - Revision history

Adk44: Created page with "== Profiling == * Compile your code with pgf90 and the following flags: FFLAGS= -Mextend -g -traceback -pg * Run GMIN on your input. Note that gmon.out is produced. * Run..."