Getting Started with cerebro

From Thom Group Wiki
Jump to navigation Jump to search

Overview

cerebro is a compute cluster. Unlike on the workstations, jobs are not run directly from the command line (e.g. just doing qchem file.in file.out). Instead, jobs are sent to a queue which is managed by SLURM. To submit a job, you need to write a submit file with information about your job then submit it to the queue using the sbatch command. Jobs create a slurm-<jobid>.out file which contains their terminal output. If you think something might have gone wrong with a job (e.g. something crashed, it ran out of time, etc...) the slurm file is usually a good place to start looking for the issue.

General information about cerebro is available on the department website here.

Helpful Commands

sbatch

sbatch is used to submit a job to the queue, e.g.

sbatch $old $long submit_file

You can specify whether a job runs on the old nodes (12 CPUs max) or the new nodes (16 CPUs max) using $old and $new, respectively.

You can also specify the partition on which you would like the job to run using one of four options:

Partition sbatch Option Time Limit Other Notes
TEST $test 4 hours highest priority partition - jobs should run right away
LONG $long 48 hours default partition
XLONG $xlong 7 days 96 core limit
XXLONG $xxlong 30 days 56 core limit

More information about SLURM on cerebro is available on the department website here and here, and general information about SLURM can be found here.

squeue

To see all the jobs in the queue, do

squeue

To see all the jobs that you have queued or running, do

squeue -u <your-crsid>

scancel

To cancel a job, do

scancel <JOBID>

Make sure you have the right job ID!


More information on queuing is available here.

Using Q-Chem

If you have not used Q-Chem on cerebro before, first check whether you have access with which qchem. If that does not return anything, you'll need to add a few lines to your .bashrc file. For Q-Chem 5.3, add

# QChem
export QC=trunk
source /home/maf63/code/qcsetup-general.bash
source ~/.slurmrc

to your .bashrc file, then do source ~/.bashrc. Doing which qchem should give /sharedscratch/maf63/qchem-general/bin/qchem.

Once you have Q-Chem set up, you'll be able to submit jobs. Here is a generic submit file for Q-Chem on cerebro:

#!/bin/bash

# Set default outfile if not defined
outfile=${outfile:-qchem.out}
scratch=${scratch:-qchem.scratch}
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE

# Run the calculation
echo "Using Q-Chem: " $QC 
rm -rf $QCSCRATCH/$scratch
cp -r $scratch $QCSCRATCH/$scratch
qchem -nt $SLURM_CPUS_ON_NODE -save $infile $outfile $scratch

# Recover the scratch directory (optional)
# cp -r $QCSCRATCH/$scratch/* $scratch/

To use this submit file, copy it to submit.qchem, then do

  1. export infile=<your_qchem_input_file>
  2. export outfile=<name_of_qchem_output_file>
  3. export scratch=<name_of_a_scratch_directory>
  4. sbatch <your options> submit.qchem

Using QCMagic

If you have not used QCMagic on cerebro before, you will need to install it first - instructions are available here.

Here is a submit file template for a QCMagic job:

#!/bin/bash

export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE

# Run the calculation
runscanSurface.py -p 12 -L --read-minima=1 etc... 

If you copy this information to a file called submit.qcmagic, you can then submit it with

sbatch $old $test submit.qcmagic

You can include multiple commands in a submit file, e.g.:

#!/bin/bash

export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE

# Run the calculation
runscanSurface.py -p 12 -L --read-minima=4 --read-only output_file.out state_4_read > term_read
runqcSDExtract.py -p 12 -L --reconverge --rem="SCF_CONVERGENCE 10" state_4_read.sd state_4 > term_reconv
runcombineSDXC.py -i state_1.sd state_2.sd state_3.sd state_4.sd -o states_1234 > term_sdxc
runrunSDXC.py -p 12 -L --template=template.in states_1234.sd states_1234 > term_new