Useful PBS scripts
If you have put some effort into writing a PBS job script for a particular type of job, please consider adding it here.
Job script with signal handler
# This is an example PBS job script that can carry out an action to clean up
# after itself when the queueing system terminates the job. You could use it to
# make your code checkpoint or similar.
#PBS -q s4
#PBS -l walltime=2:00:00
WD=/scratch/cen1001/work
OUT=$WD/output
# A shell function to clean up after an imaginary job. Replace with whatever's
# appropriate for your job.
cleanup() {
cp $OUT /home/cen1001 && rm $OUT
}
# This function gets called when PBS tells your job to exit. PBS gives a job 60
# seconds to run its exit handler and then terminates it, so whatever this does
# must happen in less than 60 seconds.
exithandler() {
echo "Job was killed" >> $OUT
cleanup
exit
}
trap exithandler SIGTERM
# The main script starts here
mkdir -p $WD
# do some busy work that generates output
i=0
while [ $i -lt 100 ]
do
echo $i >> $OUT
sleep 2
i=$((i+1))
done
# call the cleanup function
cleanup
# get our PBS stats
qstat -f $PBS_JOBID
CPMD runscript if several nodes are needed
# PBS -q s32
# PBS -l walltime=18:00:00
# PBS -l nodes=8:ppn=4
HERE=/home/mm695/whatever
file=dho2498_singlePoint
inpfile=${file}.inp
outfile=${file}.out
SCRATCH=/scratch/mm695/$file
nodes=`cat $PBS_NODEFILE | uniq`
for node in $nodes
do
rsh $node "rm -f $SCRATCH/*"
rsh $node "rmdir $SCRATCH"
rsh $node "mkdir $SCRATCH"
rsh $node "cp ${HERE}/gromos* $SCRATCH"
rsh $node "cp ${HERE}/geom_end_of_sim.crd $SCRATCH"
rsh $node "cp ${HERE}/RESTART $SCRATCH"
rsh $node "cp ${HERE}/${inpfile} $SCRATCH"
done
exe=/home/mm695/SOURCE/cpmd.x
pp=/home/mm695/pseudopot
cd $SCRATCH
# Write out some helpful info to the output file
echo "Starting job $PBS_JOBID"
echo
echo "PBS assigned me this node:"
cat $PBS_NODEFILE
echo
mvapichwrapper $exe $inpfile $pp > ${HERE}/${outfile}
for node in $nodes
do
rsh $node 'mv ${SCRATCH}/* ${HERE}'
rsh $node 'rm -f ${SCRATCH}/*'
rsh $node 'rmdir /scratch/mm695/$file'
done
qstat -f $PBS_JOBID
I've had problems in the past in with large CPMD RESTART files not being correctly copied back (worse than failure: they get corrupted or are only partially copied with no error message). This causes many "interesting" issues when I attempted to use the RESTART files for new calculations. For this reason I prefer not to do post-job tidying until I've checked things are copied back correctly. Instead, I periodically (rather, have a script to) tidy up the scratch space on nodes.
I find writing to the home disk on tardis can be incredibly expensive: you might as well write output to scratch and then copy it back.
wd=$PBS_O_WORKDIR is my friend, and saves much submit script editing.
--james 17:47, 13 March 2008 (GMT) [People who are kind to cats leave them to laze in the sun until they're needed...]
Ahem, yes, the bit with the cat was mine. I usually want to send PBS_NODEFILE through a rather more complex transformation than this one, and in those cases the idiom with cat is more legible. You are right that it adds nothing here! uniq $PBS_NODEFILE would be fine. --Catherine 18:47, 13 March 2008 (GMT)