Difference between revisions of "Useful PBS scripts"

From CUC3
Jump to navigation Jump to search
import>Cen1001
 
import>Mm695
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
If you have put some effort into writing a PBS job script for a particular type of job, please consider adding it here.
 
If you have put some effort into writing a PBS job script for a particular type of job, please consider adding it here.
  +
  +
== Job script with signal handler ==
  +
  +
# This is an example PBS job script that can carry out an action to clean up
  +
# after itself when the queueing system terminates the job. You could use it to
  +
# make your code checkpoint or similar.
  +
  +
  +
#PBS -q s4
  +
#PBS -l walltime=2:00:00
  +
  +
WD=/scratch/cen1001/work
  +
OUT=$WD/output
  +
  +
# A shell function to clean up after an imaginary job. Replace with whatever's
  +
# appropriate for your job.
  +
cleanup() {
  +
cp $OUT /home/cen1001 && rm $OUT
  +
}
  +
  +
# This function gets called when PBS tells your job to exit. PBS gives a job 60
  +
# seconds to run its exit handler and then terminates it, so whatever this does
  +
# must happen in less than 60 seconds.
  +
exithandler() {
  +
echo "Job was killed" >> $OUT
  +
cleanup
  +
exit
  +
}
  +
  +
trap exithandler SIGTERM
  +
  +
# The main script starts here
  +
  +
mkdir -p $WD
  +
  +
# do some busy work that generates output
  +
i=0
  +
while [ $i -lt 100 ]
  +
do
  +
echo $i >> $OUT
  +
sleep 2
  +
i=$((i+1))
  +
done
  +
  +
# call the cleanup function
  +
cleanup
  +
  +
# get our PBS stats
  +
qstat -f $PBS_JOBID
  +
  +
== CPMD runscript if several nodes are needed ==
  +
  +
#PBS -q s32
  +
#PBS -l walltime=18:00:00
  +
#PBS -l nodes=8:ppn=4
  +
  +
HERE=$PBS_O_WORKDIR # the directory from which the script was submitted
  +
file=dho2498_singlePoint
  +
inpfile=${file}.inp
  +
outfile=${file}.out
  +
#SCRATCH=/scratch/mm695/$file
  +
SCRATCH=/scratch/mm695/job-$PBS_JOBID # generate a unique directory in scratch.
  +
nodes=`cat $PBS_NODEFILE | uniq`
  +
for node in $nodes
  +
do
  +
rsh $node "rm -f $SCRATCH/*" # Not needed if using
  +
rsh $node "rmdir $SCRATCH" # a unique scratch directory
  +
rsh $node "mkdir $SCRATCH" # for each job.
  +
rsh $node "cp ${HERE}/gromos* $SCRATCH"
  +
rsh $node "cp ${HERE}/geom_end_of_sim.crd $SCRATCH"
  +
rsh $node "cp ${HERE}/RESTART $SCRATCH"
  +
rsh $node "cp ${HERE}/${inpfile} $SCRATCH"
  +
done
  +
exe=/home/mm695/SOURCE/cpmd.x
  +
pp=/home/mm695/pseudopot
  +
cd $SCRATCH
  +
  +
# Write out some helpful info to the output file
  +
echo "Starting job $PBS_JOBID"
  +
echo
  +
echo "PBS assigned me this node:"
  +
cat $PBS_NODEFILE
  +
echo
  +
  +
mvapichwrapper $exe $inpfile $pp > ${SCRATCH}/${outfile}
  +
  +
  +
for node in $nodes
  +
do
  +
rsh $node "mv ${SCRATCH}/* ${HERE}"
  +
rsh $node "rm -f ${SCRATCH}/*" # See note below
  +
rsh $node "rmdir /scratch/mm695/$file" # about post-job tidying.
  +
done
  +
  +
qstat -f $PBS_JOBID
  +
  +
  +
I've had problems in the past in with large CPMD RESTART files not being correctly copied back (worse than failure: they get corrupted or are only partially copied with no error message). This causes many "interesting" issues when I attempted to use the RESTART files for new calculations. For this reason I prefer not to do post-job tidying until I've checked things are copied back correctly. Instead, I periodically (rather, have a [[James_Spencer#Cleaning_up_scratch|script]] to) tidy up the scratch space on nodes.
  +
--[[User:jss43|james]] 17:47, 13 March 2008 (GMT) [People who are kind to cats leave them to laze in the sun until they're needed...]
  +
  +
Ahem, yes, the bit with the cat was mine. I usually want to send PBS_NODEFILE through a rather more complex transformation than this one, and in those cases the idiom with cat is more legible. You are right that it adds nothing here! uniq $PBS_NODEFILE would be fine.
  +
--[[User:cen1001|Catherine]] 18:47, 13 March 2008 (GMT)

Latest revision as of 13:40, 19 March 2008

If you have put some effort into writing a PBS job script for a particular type of job, please consider adding it here.

Job script with signal handler

# This is an example PBS job script that can carry out an action to clean up
# after itself when the queueing system terminates the job. You could use it to
# make your code checkpoint or similar.


#PBS -q s4
#PBS -l walltime=2:00:00

WD=/scratch/cen1001/work
OUT=$WD/output

# A shell function to clean up after an imaginary job. Replace with whatever's
# appropriate for your job.
cleanup() {
    cp $OUT /home/cen1001 && rm $OUT
}

# This function gets called when PBS tells your job to exit. PBS gives a job 60
# seconds to run its exit handler and then terminates it, so whatever this does
# must happen in less than 60 seconds.
exithandler() {
    echo "Job was killed" >> $OUT
    cleanup
    exit
}

trap exithandler SIGTERM

# The main script starts here

mkdir -p $WD

# do some busy work that generates output
i=0
while [ $i -lt 100 ]
do
 echo $i >> $OUT
 sleep 2
 i=$((i+1))
done

# call the cleanup function
cleanup

# get our PBS stats
qstat -f $PBS_JOBID

CPMD runscript if several nodes are needed

#PBS -q s32 
#PBS -l walltime=18:00:00
#PBS -l nodes=8:ppn=4

HERE=$PBS_O_WORKDIR # the directory from which the script was submitted
file=dho2498_singlePoint
inpfile=${file}.inp
outfile=${file}.out
#SCRATCH=/scratch/mm695/$file
SCRATCH=/scratch/mm695/job-$PBS_JOBID # generate a unique directory in scratch.
nodes=`cat $PBS_NODEFILE | uniq`
for node in $nodes
 do
   rsh $node "rm -f $SCRATCH/*"   #  Not needed if using
   rsh $node "rmdir $SCRATCH"     #  a unique scratch directory
   rsh $node "mkdir $SCRATCH"     #  for each job.
   rsh $node "cp ${HERE}/gromos*   $SCRATCH"
   rsh $node "cp ${HERE}/geom_end_of_sim.crd   $SCRATCH"
   rsh $node "cp ${HERE}/RESTART   $SCRATCH"
   rsh $node "cp ${HERE}/${inpfile}  $SCRATCH"
done
exe=/home/mm695/SOURCE/cpmd.x
pp=/home/mm695/pseudopot
cd $SCRATCH

# Write out some helpful info to the output file
echo "Starting job $PBS_JOBID"
echo
echo "PBS assigned me this node:"
cat $PBS_NODEFILE
echo

mvapichwrapper  $exe $inpfile $pp > ${SCRATCH}/${outfile}


for node in $nodes
 do
   rsh $node "mv ${SCRATCH}/* ${HERE}"
   rsh $node "rm -f ${SCRATCH}/*"         # See note below 
   rsh $node "rmdir /scratch/mm695/$file" # about post-job tidying.
 done

qstat -f $PBS_JOBID


I've had problems in the past in with large CPMD RESTART files not being correctly copied back (worse than failure: they get corrupted or are only partially copied with no error message). This causes many "interesting" issues when I attempted to use the RESTART files for new calculations. For this reason I prefer not to do post-job tidying until I've checked things are copied back correctly. Instead, I periodically (rather, have a script to) tidy up the scratch space on nodes. --james 17:47, 13 March 2008 (GMT) [People who are kind to cats leave them to laze in the sun until they're needed...]

Ahem, yes, the bit with the cat was mine. I usually want to send PBS_NODEFILE through a rather more complex transformation than this one, and in those cases the idiom with cat is more legible. You are right that it adds nothing here! uniq $PBS_NODEFILE would be fine. --Catherine 18:47, 13 March 2008 (GMT)