Sample GPU Batch Job and Run Instructions
Sample GPU Batch Job
To Run Sample GPU Batch Job
Notes
Back to HPC1 Documentation
Sample GPU Batch Job gpu.pbs
#!/bin/sh
#PBS -l nodes=1:ppn=16:nvidia
#PBS -o output$PBS_JOBID.log
#PBS -N GPU_test_job
#PBS -M johndoe@bnl.gov
#PBS -j oe
# Print time and date and hostname
date
hostname
# $PBS_O_WORKDIR is the directory from which you submitted the
job
cd $PBS_O_WORKDIR
module load cuda
echo ./$program ${args}
time ./${program} ${args}
To Run This Sample GPU Batch Job
cp /software/samples/gpu-batch/* ~johndoe/gpu-batch-testModify the email address in gpu.pbs to be your own
module load cuda
make map-cuda-small
qsub -v program=map-cuda-small,args=1024 gpu.pbs
The batch job above will be submitted to the default batch queue, named batch . To submit it to e.g.
queue gpudev instead:
qsub -q gpudev -v program=map-cuda-small,args=1024 gpu.pbs
Your batch output file should appear in the directory from which you
submitted the job, and have a filename like:
output3065.hpc1.csc.bnl.local.log
The first few lines of this output file should be like:
Fri Mar 14 13:38:59 EDT 2014
node08
./map-cuda-small 1024
size = 1024
0.000000 -> 0.000000
1.000000 -> 1.000000
The last few lines of this output file should be like:
1023.000000 -> 1046529.000000
copy data to device 25 micros
compute
278 micros
copy data to host 19 micros
real 0m0.401s
user 0m0.016s
sys 0m0.377s
Notes
As you can see in map-cuda-small.cu, this particular cuda program has
an input argument (we used 1024 above).
If it had not, the batch submission command would have been:
qsub -v program=map-cuda-small gpu.pbs
The data is copied from the host (node08 in the example above) to the
device (GPU), then the kernel (function that runs
on the GPU) runs many threads on the GPU, and finally the data is copied from
the device back to the host.
The host (node08) has two Intel Xeon processors having 8 cores apiece
thus 16 cores total. Thus the batch
job requests one host node and 16 cores, as well as use of the GPU for
that node. It could alternately have requested
this with:
#PBS -l nodes=1:ppn=16:gpus=1
node01 through node08 each have a GPU. The GPU boards don't have a
bus to communicate, MPI must
be used for them to communicate. map-cuda-small.cu
above does not use MPI, thus runs on only one GPU.
If our CUDA program had written output files that we wanted to write
to a separate subdirectory for each run,
then the batch script would have used ${PBS_O_WORKDIR} a little
differently, creating a subdirectory of the
directory from which we submitted the run, and having the subdirectory
name be the PBS jobid of the run:
WORK=${PBS_O_WORKDIR}/${PBS_JOBID}
[ -d ${WORK} ] || mkdir -p ${PBS_O_WORKDIR}/${PBS_JOBID}
cp ${PBS_O_WORKDIR}/$program ${WORK}
cd ${WORK}