Using GMIN and OPTIM with GPUs

From Docswiki
Jump to navigation Jump to search

Compilation

CUDAGMIN and CUDAOPTIM can be compiled using CMake and both include the AMBER 12 code without selecting this as an additional compilation option. The AMBER developers recommend only using the gnu or Intel compilers for AMBER with CUDA. CUDAOPTIM is restricted to compilation with the Intel compilers as this provides the best performance.

Before compilation, load the following modules (on most of the clusters, the icc and ifort modules are identical):

 
module load cuda/6.5
module load icc/64/2013_sp1/4/211

To compile using CMake, first create a build directory and change into this directory:

 
mkdir ~/svn/GMIN/build && cd build

In this directory type:

 
FC=ifort CC=icc CXX=icpc cmake ../source 

or

 
FC=gfortran CC=gcc CXX=g++ cmake ../source

Then:

 
ccmake .

This should bring up an options menu. Set 'WITH_CUDA' to on using the arrow keys and the enter key. Then press 'c' to configure, 'e' to exit, then 'c' and 'e' again (extra CUDA options will have appeared - the error regarding not finding the location of the SDK root directory can be ignored), then 'g' to generate the build files.

Then type:

 
make -j8

The executable should now be in the build directory.

More detailed instructions on using CMake can be found here.

Using CUDAGMIN

Here is an example data file for CUDAGMIN.

 
CUDA L
SAVE 20
EDIFF 0.0001
MAXIT 1000 2000
STEPS 1000 1.0
TEMPERATURE 1.0
MAXBFGS 0.6
STEP 0.6
MAXERISE 1.0D-10
SLOPPYCONV 1.0D-3
TIGHTCONV 1.0D-9
COMMENT DEBUG
COMMENT CUDATIME

The keyword 'CUDA' specifies that the CUDA version of LBFGS will be used. It takes a one character argument that determines the potential to be used. 'L' refers to the Lennard-Jones potential. The implementation of this potential was intended mainly as proof of principle, and a number of up to 1024 atoms is currently supported. To use AMBER 12, put the character 'A' after the keyword 'CUDA'. Some example input for the AMBER potential can be found on the group website, here.

'DEBUG' creates an additional file called 'GPU_debug_out'. More detailed information on the progress of the minimisations is written to this file.

'CUDATIME' creates the files GPU_potential.txt, 'GPU_LBFGS_total.txt', 'GPU_LBFGS_linesearch.txt' and 'GPU_LBFGS_updates.txt'. Respectively, these contain timings for the calculation of the energy and gradient, full minimisations, linesearch, and LBFGS updates.

Using CUDAOPTIM

Some example input for the AMBER potential can be found on the group website, here. Note that analytical Hessian calculations cannot be performed on the GPU. The keyword 'CUDA' is used in the same way as GMIN, with the same characters used for the potentials. As with CUDAGMIN, only LJ for up to 1024 atoms and the AMBER potential are available for GPU.

The file 'GPU_debug_out' contains all output from BFGSTS on the GPU, with extra detail being printed when using the 'DEBUG' keyword.

'CUDATIME' produces timings files named 'GPU_BFGSTS_EF_steps.txt', 'GPU_BFGSTS_total.txt', 'GPU_LBFGS_total.txt', 'GPU_BFGSTS_Rayleigh_Ritz_total.txt', 'GPU_LBFGS_updates.txt', 'GPU_BFGSTS_subspace_min_total.txt', 'GPU_LBFGS_linesearch.txt' and 'GPU_potential.txt'.

A note on history size

CUDAGMIN without the local rigid body framework

Usual jobs using AMB(9/12)GMIN run fastest with a large history size for the LBFGS algorithm (UPDATES keyword in the input file). However, most jobs with CUDAGMIN run fastest with a small history size, e.g. somewhere in the region of 4-10. The difference in speed between a run with a history size of 4 and a run with a history size of 1000 can be quite significant. However, very, very large systems can still be fastest with a large history size. It is advisable to do a few short test runs with a range of history sizes first to discover the optimal history size for your system.

CUDAGMIN with the local rigid body framework

Using the local rigid body framework (RIGIDINIT keyword in the input file), requires a large history size, of a similar size to a typical AMBGMIN job. CUDAGMIN quenches using RIGIDINIT usually take longer than if RIGIDINIT was not used, though there is still a large speed-up over AMBGMIN.