<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wikis.ch.cam.ac.uk/ro-walesdocs/wiki/index.php?action=history&amp;feed=atom&amp;title=Optimization_tricks</id>
	<title>Optimization tricks - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wikis.ch.cam.ac.uk/ro-walesdocs/wiki/index.php?action=history&amp;feed=atom&amp;title=Optimization_tricks"/>
	<link rel="alternate" type="text/html" href="https://wikis.ch.cam.ac.uk/ro-walesdocs/wiki/index.php?title=Optimization_tricks&amp;action=history"/>
	<updated>2026-04-12T04:36:43Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.7</generator>
	<entry>
		<id>https://wikis.ch.cam.ac.uk/ro-walesdocs/wiki/index.php?title=Optimization_tricks&amp;diff=1318&amp;oldid=prev</id>
		<title>Adk44: Created page with &quot;== Profiling == * Compile your code with pgf90 and the following flags:     FFLAGS= -Mextend -g -traceback -pg  * Run GMIN on your input. Note that gmon.out is produced. * Run...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wikis.ch.cam.ac.uk/ro-walesdocs/wiki/index.php?title=Optimization_tricks&amp;diff=1318&amp;oldid=prev"/>
		<updated>2019-05-13T12:29:20Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;== Profiling == * Compile your code with pgf90 and the following flags:     FFLAGS= -Mextend -g -traceback -pg  * Run GMIN on your input. Note that gmon.out is produced. * Run...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Profiling ==&lt;br /&gt;
* Compile your code with pgf90 and the following flags:&lt;br /&gt;
&lt;br /&gt;
   FFLAGS= -Mextend -g -traceback -pg&lt;br /&gt;
&lt;br /&gt;
* Run GMIN on your input. Note that gmon.out is produced.&lt;br /&gt;
* Run the profiler. The output is long, so put it to a file.&lt;br /&gt;
&lt;br /&gt;
   gprof ~/location_of_GMIN_binary/GMIN &amp;gt; my_output_file&lt;br /&gt;
&lt;br /&gt;
The first part of the file says how long the program spent in each subroutine and how many calls were made to each subroutine.&lt;br /&gt;
&lt;br /&gt;
== Speeding up small loops ==&lt;br /&gt;
&lt;br /&gt;
FOR loops should only be used when the instructions cannot be explicitly written. For example, the identity matrix should be coded as&lt;br /&gt;
&lt;br /&gt;
  I3(:,:) = 0.D0&lt;br /&gt;
  I3(1,1) = 1.D0; I3(2,2) = 1.D0; I3(3,3) = 1.D0&lt;br /&gt;
&lt;br /&gt;
rather than&lt;br /&gt;
&lt;br /&gt;
  I3(:,:) = 0.D0&lt;br /&gt;
  FOR I = 0, 3&lt;br /&gt;
    I3(I,I) = 1.D0&lt;br /&gt;
  ENDDO&lt;br /&gt;
&lt;br /&gt;
You would hope that the compiler would do this type of optimization for you, but I saw a nice speedup in my case, so I wrote this out myself.&lt;br /&gt;
&lt;br /&gt;
== Computing values only once ==&lt;br /&gt;
&lt;br /&gt;
Usually a potential subroutine has this form:&lt;br /&gt;
&lt;br /&gt;
  FOR OUTER LOOP OVER MOLECULES&lt;br /&gt;
    FOR INNER LOOP OVER MOLECULES&lt;br /&gt;
      FOR OUTER LOOP OVER SITES&lt;br /&gt;
         FOR INNER LOOP OVER SITES&lt;br /&gt;
            Compute potential contributions&lt;br /&gt;
         ENDDO&lt;br /&gt;
      ENDDO&lt;br /&gt;
    ENDDO&lt;br /&gt;
  ENDDO&lt;br /&gt;
&lt;br /&gt;
If there are &amp;#039;&amp;#039;N&amp;#039;&amp;#039; molecules with &amp;#039;&amp;#039;n&amp;#039;&amp;#039; sites each, then there are typically some &amp;#039;&amp;#039;N&amp;#039;&amp;#039; values associated with each molecule (e.g., rotation matrices), some &amp;#039;&amp;#039;Nn&amp;#039;&amp;#039; values associated with each site (e.g., position of site with respect to molecular origin), and some &amp;lt;math&amp;gt;\tfrac{1}{2}(Nn)(Nn) \sim (Nn)^2&amp;lt;/math&amp;gt; values associated with each pair of sites.&lt;br /&gt;
&lt;br /&gt;
Because the code visits the &amp;quot;Compute contributions&amp;quot; area some &amp;lt;math&amp;gt;O(N^2n^2)&amp;lt;/math&amp;gt; times, it is much more efficient if the values associated with molecules or sites are calculated in a different loop:&lt;br /&gt;
&lt;br /&gt;
  FOR LOOP OVER MOLECULES&lt;br /&gt;
    Calculate molecule values&lt;br /&gt;
  &lt;br /&gt;
    FOR LOOP OVER SITES&lt;br /&gt;
      Calculate site values&lt;br /&gt;
    ENDDO&lt;br /&gt;
  ENDDO&lt;br /&gt;
  &lt;br /&gt;
  FOR OUTER LOOP OVER MOLECULES&lt;br /&gt;
    etc&lt;br /&gt;
  ENDDO&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Putting array dimensions in the right order ==&lt;br /&gt;
&lt;br /&gt;
For historical reasons, Fortran puts its array dimensions in the opposite order from what we would expect.  For example, a 3x3 matrix A has its components stored as a 1D array in the memory in order&lt;br /&gt;
&lt;br /&gt;
 A[0][0] A[1][0] A[2][0] A[0][1] A[1][1] A[2][1] A[0][2] A[1][2] A[2][2]&lt;br /&gt;
&lt;br /&gt;
so the more efficient way to loop is to put the second component in the outermost loop:&lt;br /&gt;
&lt;br /&gt;
 for j=0,2&lt;br /&gt;
  for i=0,2&lt;br /&gt;
    compute a[i][j]&lt;br /&gt;
  enddo&lt;br /&gt;
 enddo&lt;br /&gt;
&lt;br /&gt;
The effect can apparently be 2-3x faster, but I don&amp;#039;t think that memory access is the limiting factor in our programs.  For example, I used an array of shape matrices, one 3x3 shape matrix per site per molecule.  I originally coded this as A[molecule][site][row][col], and switching to A[row][col][site][mol] made my code 0.2% faster, so I didn&amp;#039;t even bother changing it.&lt;/div&gt;</summary>
		<author><name>Adk44</name></author>
	</entry>
</feed>