A guide to using SLURM to run GPU jobs on pat

From Docswiki
Jump to navigation Jump to search

We now have a queueing system set up for use with some of our GPU machines. The head node is pat.ch.private.cam.ac.uk and all jobs should be submitted to the queue from there. More detailed information on setup and usage can be found at: http://www.ch.cam.ac.uk/computing/abc-cluster

Cards

Currently, pat has 15 GeForce GTX TITAN Black GPUs, 12 Tesla K20m GPUs and 16 GeForce GTX 980 GPUs (Maxwell architecture) on its nodes (only 8 of these are currently available as the racks are being used for extra Titan Black GPUs from eBay - 19/12/16). The Titan and Tesla cards should only be used for applications which require double precision, such as CUDAGMIN and CUDAOPTIM. The Maxwell cards are designed for single precision applications such as AMBER's pmemd.cuda in its default mode. Please do not use the double precision cards for running AMBER.

Queueing system

The queueing system on pat is SLURM. Detailed information on using SLURM can be found in the documentation. The current walltime is seven days. As on sinister, jobs should be run on local /scratch on the nodes rather than the NFS-mounted /home and /sharedscratch. The progress of your job can be viewed by sshing into the appropriate node.

Example SLURM submission script for a job running on a single GPU

The following example script can be submitted to the queue by typing 'sbatch scriptname' at the terminal.

#!/bin/bash

# Request 1 TITAN Black GPU - use '--constraint=teslak20' for a Tesla or '--constraint=maxwell' to request a Maxwell GPU for single precision runs
#SBATCH --constraint=titanblack
#SBATCH --job-name=mytestjob
#SBATCH --gres=gpu:1
#SBATCH --mail-type=FAIL

hostname
echo "Time: `date`"
source /etc/profile.d/modules.sh

# Load the appropriate compiler modules on the node - should be the same as those used to compile the executable on pat
module add cuda/7.0
module add icc/64/2015/3/187
module add anaconda/python2/2.2.0 # Needed for python networkx module - must be python 2, not 3

# Set the GPU to exclusive process mode
sudo nvidia-smi -i $CUDA_VISIBLE_DEVICES -c 3

# Make a temporary directory on the node, copy job files there and change to that directory
TMP=/scratch/$USER/$SLURM_JOB_ID
mkdir -p $TMP
cp ${SLURM_SUBMIT_DIR}/{atomgroups,coordsinirigid,coords.inpcrd,coords.prmtop,data,min.in,rbodyconfig} $TMP
cd $TMP

# Run the executable in the local node scratch directory
/home/$USER/svn/GMIN/build/CUDAGMIN

# Copy all files back to the original submission directory
cp * $SLURM_SUBMIT_DIR
STATUS=$?
echo "$STATUS"
if [ $STATUS == 0 ];
   then
      echo "No error in cp"
      cd $SLURM_SUBMIT_DIR
      rm -rf $TMP
fi

echo Finished at `date`

Example SLURM submission script for a PATHSAMPLE job using CUDAOPTIM

With SLURM, you must request the same number of GPUs on each node you are using. CUDAOPTIM also requires a CPU for each GPU being used, so make sure this is set to the same number. The example script below will run eight simultaneous CUDAOPTIM jobs, four on each node. The number of GPUs per node can be between one and four. There are currently six nodes available that can be used to run CUDAOPTIM jobs.

Also, note that you must have each node you are using set up such that you can ssh into any other node without having to type a password. To do that, first generate a public/private RSA key pair using:

ssh-keygen -t rsa

Do not enter a passphrase when prompted (just press enter). Then type:

cat .ssh/id_rsa.pub >> .ssh/authorized_keys

Test whether this has worked by sshing into one of the nodes and then another directly from there. All the nodes see the same home directory, so if it's working for one node then it should work for all the rest.

IMPORTANT: your pathdata file must contain the keywords SLURM and CUDA. It should not contain SSH or PBS. Using ssh for job submission allows your jobs to use GPUs that were not allocated to them by the queueing system, so it is really important to use the SLURM keyword to avoid crashing other people's jobs!

#!/bin/bash

#SBATCH --job-name=mypathsamplejob
#SBATCH --nodes=2 # Specify the number of nodes you want to run on
#SBATCH --gres=gpu:4 # Specify the number of GPUs you want per node
#SBATCH --ntasks-per-node=4 # Specify a number of CPUs **equal to** the number of GPUs requested per node
#SBATCH --constraint='teslak20|titanblack' # Use either Titan or Tesla nodes or some combination
#SBATCH --requeue # Requeue job in the case of node failure
#SBATCH --mail-type=FAIL # Receive an email if your job fails

echo "Time: `date`"
source /etc/profile.d/modules.sh

# Load the appropriate compiler modules on the nodes - should be the same as those used to compile the executables on pat
module add cuda/6.5
module add icc/64/2013_sp1/4/211
module add anaconda/python2/2.2.0 # Needed for python networkx module - must be python 2, not 3

echo "Setting GPUs to exclusive process mode on: "; srun hostname
srun -l sudo nvidia-smi -i $CUDA_VISIBLE_DEVICES -c 3

gpuspernode=0
visibledevices=$CUDA_VISIBLE_DEVICES

for i in $(echo $visibledevices | sed "s/,/ /g")
do
    gpuspernode=$(( gpuspernode + 1 ))
done

totalnumgpus=$(( gpuspernode * $SLURM_JOB_NUM_NODES ))
echo "Total number of GPUs requested: $totalnumgpus"

echo $totalnumgpus > nodes.info
srun hostname >> nodes.info
echo $USER >> nodes.info
pwd >> nodes.info

# If using a cluster other than pat and your slurm version is 14 or lower, prefix the executable with srun -N1 -n1
/home/$USER/svn/PATHSAMPLE/build/PATHSAMPLE > output

echo Finished at `date`