Difference between revisions of "Fortran issues"
import>Jss43 m (→ifort and automatic arrays: Add in link to testarr source.) |
import>Jss43 |
||
(10 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== Real intrinsic == |
== Real intrinsic == |
||
− | Care needs to be taken with using REAL(X), which is quite common in older codes. The default behaviour for REAL differs if X is complex rather than integer or real. |
+ | Care needs to be taken with using REAL(X), which is quite common in older codes. The default behaviour for REAL differs if X is complex rather than integer or real. This is an oddity compared to other intrinsic functions, which usually return the same kind as the argument, but it is the specification. I have found that DREAL isn't always portable (some compilers have issues when the argument to DREAL is the value of a derived type, e.g. DREAL(X%v) fails at compile time but REAL(X%v) doesn't). |
<pre> |
<pre> |
||
Line 54: | Line 54: | ||
Or you can just avoid REAL. ;-) |
Or you can just avoid REAL. ;-) |
||
+ | |||
+ | In a similar vein, the following declarations are not equivalent: |
||
+ | <pre> |
||
+ | real(kind(0.d0)), parameter :: PI=3.1415926535897931 |
||
+ | real(kind(0.d0)), parameter :: PI=3.1415926535897931_8 |
||
+ | </pre> |
||
+ | I have found some legacy code which used the former, but meant the latter... |
||
== ifort and automatic arrays == |
== ifort and automatic arrays == |
||
Line 59: | Line 66: | ||
This information cost me 3 days of my life. How I cursed ifort... ;-) --[[User:jss43|james]] 13:07, 30 June 2008 (BST) |
This information cost me 3 days of my life. How I cursed ifort... ;-) --[[User:jss43|james]] 13:07, 30 June 2008 (BST) |
||
− | Automatic arrays (where the limits are explicitly provided in the declaration, e.g. X(10,10,10)) are allocated differently in ifort than dynamic arrays using allocatable. The former are allocated on the stack, the latter on the heap [http://ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html |
+ | Automatic arrays (where the limits are explicitly provided in the declaration, e.g. X(10,10,10)) are allocated differently in ifort than dynamic arrays using allocatable. The former are allocated on the stack, the latter on the heap ([http://ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html differences]). This also differs from how most other compilers work (see note below). The gotchas are that the default stack on linux is quite small (~8MB) and the allocation on the stack is cumulative: a automatic array is only deallocated once the routine goes out of scope (i.e. has been exited) in a LIFO fashion. Once the program goes over the limit, it will just segfault on stepping into a routine, even if the memory allocation for that routine is very small (1kb over the stack limit is enough!). This means that a code can segfault in seemingly random places for no reason, and is rather hard to track down. |
So, how can we avoid this? There are a few possible workarounds: |
So, how can we avoid this? There are a few possible workarounds: |
||
Line 89: | Line 96: | ||
* portland: testarr.f90 passes. |
* portland: testarr.f90 passes. |
||
* g95: testarr.f90 passes. |
* g95: testarr.f90 passes. |
||
− | * nag: testarr.f90 passes. |
+ | * nag: testarr.f90 passes (but it gives a warning about the arrays which are not used). |
* pathscale: testarr.f90 FAILS. Pathscale appears to do the same trick as ifort, but is much more helpful. Like ifort, it can be overcome with a compiler flag (see the manpage for more details). |
* pathscale: testarr.f90 FAILS. Pathscale appears to do the same trick as ifort, but is much more helpful. Like ifort, it can be overcome with a compiler flag (see the manpage for more details). |
||
<pre> |
<pre> |
||
Line 109: | Line 116: | ||
If this line gets printed, then s2 passed. |
If this line gets printed, then s2 passed. |
||
</pre> |
</pre> |
||
+ | |||
+ | == Non-advancing io == |
||
+ | |||
+ | Non-advancing input/ouput can be specified by using advance='no' in the read/write statement. This feature was implemented in Fortran 95, and is very useful in writing out arrays. Many compilers also allow the use of $ in the format string to do the same thing, but this is a non-standard extension and should not be relied upon (in particular, pathscale does not have this implemented). |
||
+ | |||
+ | There is, however, a bug in the ifort compilers, which is shown by the following code: |
||
+ | |||
+ | <pre> |
||
+ | program test_advance |
||
+ | |||
+ | implicit none |
||
+ | integer :: ierr,noAutoDets,k |
||
+ | real(8) :: ACF(7) |
||
+ | |||
+ | noAutoDets=7 |
||
+ | ACF=0.d0 |
||
+ | |||
+ | write (6,*) "Get here!",NoAutoDets,ACF(:) |
||
+ | call flush(6) |
||
+ | do k=1,NoAutoDets |
||
+ | write(6,'(F20.10$)') ACF(k) |
||
+ | end do |
||
+ | write (6,*) |
||
+ | write (6,*) "Get here 2!" |
||
+ | |||
+ | do k=1,NoAutoDets |
||
+ | write(6,'(F20.10)',advance='no') ACF(k) |
||
+ | end do |
||
+ | write (6,*) |
||
+ | call flush(6) |
||
+ | |||
+ | end program test_advance |
||
+ | </pre> |
||
+ | |||
+ | This test program fails on the advance='no' statement with all versions of ifort prior to 10.1.017, with a variety of error messages which appear to be both platform and version specific and include hanging, streams of libc errors which force the terminal to be closed, memory errors (which is what we first saw and were "fun" to track down), segmentation faults and invalid statement errors. |
||
+ | |||
+ | == Installing newer versions of gfortran == |
||
+ | |||
+ | gfortran is rather strict in some areas, and so is very useful for development and testing work. It is not particularly fast though: for production work it is almost certainly better to use one of the commercial compilers. gfortran is free (as in both beer and speech), so is also good for working away from the sector. |
||
+ | |||
+ | gfortran 4.1 is installed on the sector 10.2 image (and is on the path: no modules are needed). Unfortunately 4.1 has some minor issues and couldn't compile our code, mainly due to a poor implementation of operator overloading. Later versions fix this and will be in the next sector image, but I'm not the most patient person. ;-) Fortunately it is incredibly easy to install a later version (I chose 4.2, as I already was using it on my laptop, but 4.3 and 4.4 (in development) are available) as GNU offer [http://gcc.gnu.org/wiki/GFortranBinaries precompiled binaries]. I'm not interested in speed, so didn't bother compiling my own. |
||
+ | |||
+ | I did: |
||
+ | <pre> |
||
+ | jss43@keiko:~/src/> wget -O - http://www.physik.fu-berlin.de/~tburnus/gcc-trunk/gcc-4.2-x86_64.tar.gz | tar xvzf - |
||
+ | </pre> |
||
+ | |||
+ | This downloads the gcc-4.2 suite (for a 64-bit machines) and extracts it to ~/src/gcc-4.2, which contains the binary and library subdirectories. We are now good to go! The gfortran binary lives in ~/src/gcc-4.2/bin/gfortran. |
||
+ | |||
+ | There are a few gotchas however: |
||
+ | * Use either -static or set an appropriate LD_LIBRARY_PATH as otherwise the gfortran.so library is not found at runtime (or the wrong library is used, e.g. the system installation). Either way, barfing happens. We can use [[environment modules]] to help us. |
||
+ | * The acml library provided by the module doesn't work: the gcc people changed some interfaces and you get undefined references. The provided atlas modules don't work on all machines (e.g. my workstation). [[Lapack compilation]] is pretty straightforward though. |
||
+ | |||
+ | The 10.3 image includes a later version of gfortran. These notes are still useful for later versions of gfortran, however. |
||
+ | |||
+ | == Cascade compilation == |
||
+ | |||
+ | Cascade compilation is an absolute pain. If you change: |
||
+ | <pre> |
||
+ | module m |
||
+ | contains |
||
+ | subroutine s(a) |
||
+ | integer :: a |
||
+ | write (6,*) a |
||
+ | end subroutine s |
||
+ | end module m |
||
+ | </pre> |
||
+ | to: |
||
+ | <pre> |
||
+ | module m |
||
+ | contains |
||
+ | subroutine s(a) |
||
+ | integer :: a |
||
+ | write (6,*) a**2 |
||
+ | end subroutine s |
||
+ | end module m |
||
+ | </pre> |
||
+ | then a standard makefile will recompile all source files USEing m, even though the interface has not changed (which is all the dependencies care about). If you change the guts of a "root" module that is USEd throughout your code, the recompilation can take some time (for us, it could amount to essentially a clean build). Happily, someone else has done the hard work to solve this: [http://www.nicdan.id.au/computers/compiling/recompile.html]. The Alavi Group settled on option 2. |
Latest revision as of 08:44, 13 October 2008
I thought it would be useful to have a page for noting Fortran problems which aren't well documented elsewhere or are common gotchas... --james 16:23, 25 June 2008 (BST)
Real intrinsic
Care needs to be taken with using REAL(X), which is quite common in older codes. The default behaviour for REAL differs if X is complex rather than integer or real. This is an oddity compared to other intrinsic functions, which usually return the same kind as the argument, but it is the specification. I have found that DREAL isn't always portable (some compilers have issues when the argument to DREAL is the value of a derived type, e.g. DREAL(X%v) fails at compile time but REAL(X%v) doesn't).
keiko:~/src/test> more testkind.f90 && ifort testkind.f90 && ./a.out program testkind implicit none integer,parameter :: r1=kind(0.0) integer,parameter :: r2=kind(0.d0) write (6,'(/a/)') 'Output:' write (6,*) real(1) write (6,*) real(1,r1) write (6,*) real(1,r2) write (6,*) real(1.23456d0) write (6,*) real(1.23456d0,r1) write (6,*) real(1.23456d0,r2) write (6,*) real(dcmplx(1.23456d0,0.2155660d0)) write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r1) write (6,*) real(dcmplx(1.23456d0,0.2155660d0),r2) end program testkind --~~~~ Output: 1.000000 1.000000 1.00000000000000 1.234560 1.234560 1.23456000000000 1.23456000000000 1.234560 1.23456000000000
But the -r8 flag can help. It's safer to specify the kind in the code, especially if other people compile your code (and hack the makefiles).
jss43@keiko:~/src/test> ifort -r8 testkind.f90 && ./a.out Output: 1.00000000000000 1.00000000000000 1.00000000000000 1.23456000000000 1.23456000000000 1.23456000000000 1.23456000000000 1.23456000000000 1.23456000000000
Or you can just avoid REAL. ;-)
In a similar vein, the following declarations are not equivalent:
real(kind(0.d0)), parameter :: PI=3.1415926535897931 real(kind(0.d0)), parameter :: PI=3.1415926535897931_8
I have found some legacy code which used the former, but meant the latter...
ifort and automatic arrays
This information cost me 3 days of my life. How I cursed ifort... ;-) --james 13:07, 30 June 2008 (BST)
Automatic arrays (where the limits are explicitly provided in the declaration, e.g. X(10,10,10)) are allocated differently in ifort than dynamic arrays using allocatable. The former are allocated on the stack, the latter on the heap (differences). This also differs from how most other compilers work (see note below). The gotchas are that the default stack on linux is quite small (~8MB) and the allocation on the stack is cumulative: a automatic array is only deallocated once the routine goes out of scope (i.e. has been exited) in a LIFO fashion. Once the program goes over the limit, it will just segfault on stepping into a routine, even if the memory allocation for that routine is very small (1kb over the stack limit is enough!). This means that a code can segfault in seemingly random places for no reason, and is rather hard to track down.
So, how can we avoid this? There are a few possible workarounds:
- Use another compiler.
- Increase your stack size using ulimit -s.
- Use allocatable arrays for everything.
- Use ifort -heap-arrays to force allocation of automatic arrays to be done on the stack. This is probably the best option for legacy code, where changing all the declarations is too tedious.
Intel do it this way as (de)allocation is faster on the stack than the heap (though seriously, if memory (de)allocation is a bottleneck, you probably need to rethink parts of your code ;-)). The (lack of an) error message is incredibly unhelpful though, and tracking a segmentation fault to this issue is painful.
A minimum code that shows this issue is testarr.f90. This shows the proof of concept, but it should be emphasised that you could have a chain of calls from one subroutine to the next, which all contain some (individually small) automatic arrays, and which would eventually push you over the stack limit. This is quite easy to do without realising it. It is interesting compare the behaviour of ifort and, for example, gfortran:
jss43@keiko:~/src/test/testifort> echo 'Using gfortran' && gfortran testarr.f90 && ./a.out ; echo -e '\nUsing ifort' && ifort testarr.f90 && ./a.out Using gfortran If this line gets printed, then s1 passed. If this line gets printed, then s2 passed. Using ifort If this line gets printed, then s1 passed. Segmentation fault
Intel information on this issue.
Other compilers
- gfortran: testarr.f90 passes.
- portland: testarr.f90 passes.
- g95: testarr.f90 passes.
- nag: testarr.f90 passes (but it gives a warning about the arrays which are not used).
- pathscale: testarr.f90 FAILS. Pathscale appears to do the same trick as ifort, but is much more helpful. Like ifort, it can be overcome with a compiler flag (see the manpage for more details).
jss43@keiko:~/src/test/testifort> pathf95 testarr.f90 && ./a.out If this line gets printed, then s1 passed. **** Segmentation fault! Fault address: 0xff0ecda0 Fault address is 11989600 bytes below the nearest valid mapping boundary, which is at 0xffc5c000. This is likely to have been caused by a stack overflow. Use your shell's ulimit or limit command to see if your stack size limit is too small. Aborted jss43@keiko:~/src/test/testifort> pathf95 -LANG:heap_allocation_threshold=0 testarr.f90 && ./a.out If this line gets printed, then s1 passed. If this line gets printed, then s2 passed.
Non-advancing io
Non-advancing input/ouput can be specified by using advance='no' in the read/write statement. This feature was implemented in Fortran 95, and is very useful in writing out arrays. Many compilers also allow the use of $ in the format string to do the same thing, but this is a non-standard extension and should not be relied upon (in particular, pathscale does not have this implemented).
There is, however, a bug in the ifort compilers, which is shown by the following code:
program test_advance implicit none integer :: ierr,noAutoDets,k real(8) :: ACF(7) noAutoDets=7 ACF=0.d0 write (6,*) "Get here!",NoAutoDets,ACF(:) call flush(6) do k=1,NoAutoDets write(6,'(F20.10$)') ACF(k) end do write (6,*) write (6,*) "Get here 2!" do k=1,NoAutoDets write(6,'(F20.10)',advance='no') ACF(k) end do write (6,*) call flush(6) end program test_advance
This test program fails on the advance='no' statement with all versions of ifort prior to 10.1.017, with a variety of error messages which appear to be both platform and version specific and include hanging, streams of libc errors which force the terminal to be closed, memory errors (which is what we first saw and were "fun" to track down), segmentation faults and invalid statement errors.
Installing newer versions of gfortran
gfortran is rather strict in some areas, and so is very useful for development and testing work. It is not particularly fast though: for production work it is almost certainly better to use one of the commercial compilers. gfortran is free (as in both beer and speech), so is also good for working away from the sector.
gfortran 4.1 is installed on the sector 10.2 image (and is on the path: no modules are needed). Unfortunately 4.1 has some minor issues and couldn't compile our code, mainly due to a poor implementation of operator overloading. Later versions fix this and will be in the next sector image, but I'm not the most patient person. ;-) Fortunately it is incredibly easy to install a later version (I chose 4.2, as I already was using it on my laptop, but 4.3 and 4.4 (in development) are available) as GNU offer precompiled binaries. I'm not interested in speed, so didn't bother compiling my own.
I did:
jss43@keiko:~/src/> wget -O - http://www.physik.fu-berlin.de/~tburnus/gcc-trunk/gcc-4.2-x86_64.tar.gz | tar xvzf -
This downloads the gcc-4.2 suite (for a 64-bit machines) and extracts it to ~/src/gcc-4.2, which contains the binary and library subdirectories. We are now good to go! The gfortran binary lives in ~/src/gcc-4.2/bin/gfortran.
There are a few gotchas however:
- Use either -static or set an appropriate LD_LIBRARY_PATH as otherwise the gfortran.so library is not found at runtime (or the wrong library is used, e.g. the system installation). Either way, barfing happens. We can use environment modules to help us.
- The acml library provided by the module doesn't work: the gcc people changed some interfaces and you get undefined references. The provided atlas modules don't work on all machines (e.g. my workstation). Lapack compilation is pretty straightforward though.
The 10.3 image includes a later version of gfortran. These notes are still useful for later versions of gfortran, however.
Cascade compilation
Cascade compilation is an absolute pain. If you change:
module m contains subroutine s(a) integer :: a write (6,*) a end subroutine s end module m
to:
module m contains subroutine s(a) integer :: a write (6,*) a**2 end subroutine s end module m
then a standard makefile will recompile all source files USEing m, even though the interface has not changed (which is all the dependencies care about). If you change the guts of a "root" module that is USEd throughout your code, the recompilation can take some time (for us, it could amount to essentially a clean build). Happily, someone else has done the hard work to solve this: [1]. The Alavi Group settled on option 2.