Difference between revisions of "Fortran issues"

From CUC3
Jump to navigation Jump to search
import>Jss43
import>Jss43
Line 22: Line 22:
 
write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r2)
 
write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r2)
 
end program testkind
 
end program testkind
  +
--~~~~
 
 
Output:
 
Output:
   
Line 54: Line 54:
   
 
Or you can just avoid REAL. ;-)
 
Or you can just avoid REAL. ;-)
  +
  +
== ifort and automatic arrays ==
  +
  +
This information cost me 3 days of my life. How I cursed ifort... ;-) --[[User:jss43|james]] 13:07, 30 June 2008 (BST)
  +
  +
Automatic arrays (where the limits are explicitly provided in the declaration, e.g. X(10,10,10)) are allocated differently in ifort than dynamic arrays using allocatable. The former are allocated on the stack, the latter on the heap [http://ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html difference]. This also differs from how most other compilers work (see note below). The gotchas are that the default stack on linux is quite small (~8MB) and the allocation on the stack is cumulative: a automatic array is only deallocated once the routine goes out of scope (i.e. has been exited). Once the program goes over the limit, it will just segfault on stepping into a routine, even if the memory allocation for that routine is very small (1kb over the stack limit is enough!). This means that a code can segfault in seemingly random places for no reason, and is rather hard to track down.
  +
  +
So, how can we avoid this? There are a few possible workarounds:
  +
# Use another compiler.
  +
# Increase your stack size using ulimit -s.
  +
# Use allocatable arrays for everything.
  +
# Use ifort -heap-arrays to force allocation of automatic arrays to be done on the stack. This is probably the best option for legacy code, where changing all the declarations is too tedious.
  +
  +
Intel do it this way as (de)allocation is faster on the stack than the heap (though seriously, if memory (de)allocation is a bottleneck, you probably need to rethink parts of your code ;-)). The (lack of an) error message is incredibly unhelpful though, and tracking a segmentation fault to this issue is painful.
  +
  +
A minimum code that shows this issue is testarr.f90. This shows the proof of concept, but it should be emphasised that you could have a chain of calls from one subroutine to the next, which all contain some (individually small) automatic arrays, and which would eventually push you over the stack limit. This is quite easy to do without realising it. It is interesting compare the behaviour of ifort and, for example, gfortran:
  +
  +
<pre>
  +
jss43@keiko:~/src/test/testifort> echo 'Using gfortran' && gfortran testarr.f90 && ./a.out ; echo -e '\nUsing ifort' && ifort testarr.f90 && ./a.out
  +
Using gfortran
  +
If this line gets printed, then s1 passed.
  +
If this line gets printed, then s2 passed.
  +
  +
Using ifort
  +
If this line gets printed, then s1 passed.
  +
Segmentation fault
  +
</pre>
  +
  +
[http://www.intel.com/support/performancetools/fortran/sb/cs-007790.htm Intel information] on this issue.
  +
  +
=== Other compilers ===
  +
  +
* gfortran: testarr.f90 passes.
  +
* portland: testarr.f90 passes.
  +
* g95: testarr.f90 passes.
  +
* nag: testarr.f90 passes.
  +
* pathscale: testarr.f90 FAILS. Pathscale appears to do the same trick as ifort, but is much more helpful. Like ifort, it can be overcome with a compiler flag (see the manpage for more details).
  +
<pre>
  +
jss43@keiko:~/src/test/testifort> pathf95 testarr.f90 && ./a.out
  +
If this line gets printed, then s1 passed.
  +
  +
  +
**** Segmentation fault! Fault address: 0xff0ecda0
  +
  +
Fault address is 11989600 bytes below the nearest valid
  +
mapping boundary, which is at 0xffc5c000.
  +
  +
This is likely to have been caused by a stack overflow.
  +
Use your shell's ulimit or limit command to see if your
  +
stack size limit is too small.
  +
Aborted
  +
jss43@keiko:~/src/test/testifort> pathf95 -LANG:heap_allocation_threshold=0 testarr.f90 && ./a.out
  +
If this line gets printed, then s1 passed.
  +
If this line gets printed, then s2 passed.
  +
</pre>

Revision as of 12:07, 30 June 2008

I thought it would be useful to have a page for noting Fortran problems which aren't well documented elsewhere or are common gotchas... --james 16:23, 25 June 2008 (BST)

Real intrinsic

Care needs to be taken with using REAL(X), which is quite common in older codes. The default behaviour for REAL differs if X is complex rather than integer or real. Some googling reveals this is an oddity, but it is the specification. I have found that DREAL isn't always portable (some compilers have issues when the argument to DREAL is the value of a derived type, e.g. DREAL(X%v) fails at compile time but REAL(X%v) doesn't).

keiko:~/src/test> more testkind.f90 && ifort testkind.f90 && ./a.out 
program testkind
    implicit none
    integer,parameter :: r1=kind(0.0)
    integer,parameter :: r2=kind(0.d0)
    write (6,'(/a/)') 'Output:'
    write (6,*) real(1)
    write (6,*) real(1,r1)
    write (6,*) real(1,r2)      
    write (6,*) real(1.23456d0)
    write (6,*) real(1.23456d0,r1)
    write (6,*) real(1.23456d0,r2)      
    write (6,*) real(dcmplx(1.23456d0,0.2155660d0))
    write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r1)
    write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r2)      
end program testkind
--~~~~
Output:

   1.000000    
   1.000000    
   1.00000000000000     
   1.234560    
   1.234560    
   1.23456000000000     
   1.23456000000000     
   1.234560    
   1.23456000000000

But the -r8 flag can help. It's safer to specify the kind in the code, especially if other people compile your code (and hack the makefiles).

jss43@keiko:~/src/test> ifort -r8 testkind.f90 && ./a.out
Output:

   1.00000000000000     
   1.00000000000000     
   1.00000000000000     
   1.23456000000000     
   1.23456000000000     
   1.23456000000000     
   1.23456000000000     
   1.23456000000000     
   1.23456000000000  

Or you can just avoid REAL. ;-)

ifort and automatic arrays

This information cost me 3 days of my life. How I cursed ifort... ;-) --james 13:07, 30 June 2008 (BST)

Automatic arrays (where the limits are explicitly provided in the declaration, e.g. X(10,10,10)) are allocated differently in ifort than dynamic arrays using allocatable. The former are allocated on the stack, the latter on the heap difference. This also differs from how most other compilers work (see note below). The gotchas are that the default stack on linux is quite small (~8MB) and the allocation on the stack is cumulative: a automatic array is only deallocated once the routine goes out of scope (i.e. has been exited). Once the program goes over the limit, it will just segfault on stepping into a routine, even if the memory allocation for that routine is very small (1kb over the stack limit is enough!). This means that a code can segfault in seemingly random places for no reason, and is rather hard to track down.

So, how can we avoid this? There are a few possible workarounds:

  1. Use another compiler.
  2. Increase your stack size using ulimit -s.
  3. Use allocatable arrays for everything.
  4. Use ifort -heap-arrays to force allocation of automatic arrays to be done on the stack. This is probably the best option for legacy code, where changing all the declarations is too tedious.

Intel do it this way as (de)allocation is faster on the stack than the heap (though seriously, if memory (de)allocation is a bottleneck, you probably need to rethink parts of your code ;-)). The (lack of an) error message is incredibly unhelpful though, and tracking a segmentation fault to this issue is painful.

A minimum code that shows this issue is testarr.f90. This shows the proof of concept, but it should be emphasised that you could have a chain of calls from one subroutine to the next, which all contain some (individually small) automatic arrays, and which would eventually push you over the stack limit. This is quite easy to do without realising it. It is interesting compare the behaviour of ifort and, for example, gfortran:

jss43@keiko:~/src/test/testifort> echo 'Using gfortran' && gfortran testarr.f90 && ./a.out ; echo -e '\nUsing ifort' && ifort testarr.f90 && ./a.out
Using gfortran
 If this line gets printed, then s1 passed.
 If this line gets printed, then s2 passed.

Using ifort
 If this line gets printed, then s1 passed.
Segmentation fault

Intel information on this issue.

Other compilers

  • gfortran: testarr.f90 passes.
  • portland: testarr.f90 passes.
  • g95: testarr.f90 passes.
  • nag: testarr.f90 passes.
  • pathscale: testarr.f90 FAILS. Pathscale appears to do the same trick as ifort, but is much more helpful. Like ifort, it can be overcome with a compiler flag (see the manpage for more details).
jss43@keiko:~/src/test/testifort> pathf95 testarr.f90 && ./a.out 
 If this line gets printed, then s1 passed.


**** Segmentation fault!  Fault address: 0xff0ecda0

Fault address is 11989600 bytes below the nearest valid
mapping boundary, which is at 0xffc5c000.

This is likely to have been caused by a stack overflow.
Use your shell's ulimit or limit command to see if your
stack size limit is too small.
Aborted
jss43@keiko:~/src/test/testifort> pathf95 -LANG:heap_allocation_threshold=0 testarr.f90 && ./a.out 
 If this line gets printed, then s1 passed.
 If this line gets printed, then s2 passed.