BNL | Computing and Data Sciences | High Performance Computing Cluster (HPC1)

Because you are not running JavaScript or allowing active scripting, some features on this page my not work. >> Enable Javascript <<

Electron-Ion Collider

Support Orgs Dept. Codes

Sample GPU Batch Job and Run Instructions

Sample GPU Batch Job
To Run Sample GPU Batch Job
Notes
Back to HPC1 Documentation

Sample GPU Batch Job gpu.pbs

#!/bin/sh
#PBS -l nodes=1:ppn=16:nvidia
#PBS -o output$PBS_JOBID.log
#PBS -N GPU_test_job
#PBS -M johndoe@bnl.gov
#PBS -j oe

# Print time and date and hostname
date
hostname

# $PBS_O_WORKDIR is the directory from which you submitted the job
cd $PBS_O_WORKDIR

module load cuda

echo ./$program ${args}

time ./${program} ${args}

To Run This Sample GPU Batch Job

cp /software/samples/gpu-batch/* ~johndoe/gpu-batch-test
Modify the email address in gpu.pbs to be your own
module load cuda
make map-cuda-small
qsub -v program=map-cuda-small,args=1024 gpu.pbs

The batch job above will be submitted to the default batch queue, named batch . To submit it to e.g.
queue gpudev instead:
qsub -q gpudev -v program=map-cuda-small,args=1024 gpu.pbs

Your batch output file should appear in the directory from which you submitted the job, and have a filename like:
output3065.hpc1.csc.bnl.local.log

The first few lines of this output file should be like:
Fri Mar 14 13:38:59 EDT 2014
node08
./map-cuda-small 1024
size = 1024
0.000000 -> 0.000000
1.000000 -> 1.000000

The last few lines of this output file should be like:
1023.000000 -> 1046529.000000
copy data to device 25 micros
compute 278 micros
copy data to host 19 micros

real 0m0.401s
user 0m0.016s
sys 0m0.377s

Notes

As you can see in map-cuda-small.cu, this particular cuda program has an input argument (we used 1024 above).
If it had not, the batch submission command would have been:
qsub -v program=map-cuda-small gpu.pbs

The data is copied from the host (node08 in the example above) to the device (GPU), then the kernel (function that runs
on the GPU) runs many threads on the GPU, and finally the data is copied from the device back to the host.

The host (node08) has two Intel Xeon processors having 8 cores apiece thus 16 cores total. Thus the batch
job requests one host node and 16 cores, as well as use of the GPU for that node. It could alternately have requested
this with:
#PBS -l nodes=1:ppn=16:gpus=1

node01 through node08 each have a GPU. The GPU boards don't have a bus to communicate, MPI must
be used for them to communicate. map-cuda-small.cu above does not use MPI, thus runs on only one GPU.

If our CUDA program had written output files that we wanted to write to a separate subdirectory for each run,
then the batch script would have used ${PBS_O_WORKDIR} a little differently, creating a subdirectory of the
directory from which we submitted the run, and having the subdirectory name be the PBS jobid of the run:
WORK=${PBS_O_WORKDIR}/${PBS_JOBID}
[ -d ${WORK} ] || mkdir -p ${PBS_O_WORKDIR}/${PBS_JOBID}
cp ${PBS_O_WORKDIR}/$program ${WORK}
cd ${WORK}

Back to HPC1 Documentation

Sample GPU Batch Job and Run Instructions

Sample GPU Batch Job gpu.pbs

To Run This Sample GPU Batch Job

Notes

Brookhaven National Laboratory

Brookhaven Science Associates