NY Blue/L: Sample LoadLeveler Batch Job Control File
Sample LoadLeveler Batch Job Control File
Notes for Sample
Basic Info
Low Priority Classes
Multistep Jobs
Back to Batch Job Submission
Sample LoadLeveler batch job control file app1.run:
Note: The following sample specifies class
normaldyn
There are also classes:
short
long
A description of the latter two classes and how to modify this sample for a class short or long job can be found in the Notes for Sample .
# @ job_type = bluegene
# @ class = normaldyn
#
# The executable that will run your parallel application should always be specified as per the next line.
# @ executable = /bgl/BlueLight/ppcfloor/bglsys/bin/mpirun
#
# Run on 512 nodes using a dynamic partition.
# Specify partition size using the following statement. This statement is the only way a partition size
# should ever be specified in a LoadLeveler job control file, i.e. use of bg_partition
# has been eliminated.
# It is possible to run on fewer processors than those afforded by the partition size, see the
# description of -np in the "Notes for Sample" below. BUT NOTE THAT you must use
# @ bg_size
# and not -np to specify the partition size. In other words, you must use
# @ bg_size
# to allocate the partition.
# Then you can optionally use -np to run on fewer processors if this is necessary
# for your run -- but you will charged for the entire partition that you allocated.
#
# @ bg_size = 512
#
# initialdir (see the next active line) will be used as the working directory for this batch job.
# @ initialdir = /home/johndoe/app1/runs
#
# If for example your jobid is 82, your output and error will be written in
# directory /home/johndoe/app1/runs, to files 82.out and 82.err respectively.
# @ input = /dev/null
# @ output = $(jobid).out
# @ error = $(jobid).err
#
# Maximum wall clock time for job will be 5 minutes.
# @ wall_clock_limit = 00:05:00
#
# Send email to johndoe@bnl.gov when job has completed.
# @ notification = complete
# @ notify_user = johndoe@bnl.gov
#
# Specify executable for your parallel application, and arguments to that executable.
# Note that the arguments to specify for the executable will vary depending upon the executable.
# Specify any special environment variables for your application that need to be in the environment
# presented to the job on the compute nodes, they will vary
# depending upon the application - some applications will not require any - so delete or modify the
# -env specification below.
# @ arguments = -exe /home/johndoe/app1/exe/app1.exe \
-cwd /home/johndoe/app1/runs \
-mode VN \
-env "LOGNAME=/path/to/my.logfile DATAPATH=/path/to/my.datafile" \
-args "-i input.inp -o output.out"
#
# The next statement marks the end of the job step. This example is a one-job-step batch job,
# so this is equivalent to saying that the next statement marks the end of the batch job.
# @ queue
Notes for Batch Job Control File Sample
Job Classes
- You must specify:
# @ class = normaldyn
# @ class = long
or
# @ class = short
- Class normaldyn jobs use a dynamic partition and have a 48 hour wall limit.
Class long jobs use a dynamic partition and have a 72 hour wall limit.
Class short jobs use a dynamic partition and have a 24 hour wall limit.
- As you can see, class long should be used for the longest jobs, and class normaldyn for slightly shorter jobs.
Class long has a wall clock limit of 72 hours, is for dynamic partition jobs, and these must specify either 32 or 128 nodes .
Class normaldyn has a wall clock limit of 48 hours, is for dynamic partition jobs, and these must specify 512 nodes.
- See the bullets below for more about classes normaldyn, short , and long .
- Class short should be used for shorter (than class normal or normaldyn) dynamic partition jobs requiring either a 1K, 2K, 3K, or 4K partition (1024, 2048, 3072, or 4096 compute nodes respectively), the wall clock limit is 24 hours. For example specify:
# @ bg_size = 1024
- # @ bg_size for class short jobs must specify either 1024 or 2048 or 3072 or 4096 . No other values are permitted.
- Class normaldyn should be used for 48 hour wall clock limit dynamic partition jobs requiring 512 nodes.
Specify :
# @ bg_size = 512
- # @ bg_size for class normaldyn jobs must specify 512. No other value is permitted.
- Class long should be used for 72 hour wall clock limit dynamic partition jobs requiring 32 or 128 nodes.
Specify :
# @ bg_size = 32
or
# @ bg_size = 128
- # @ bg_size for class long jobs must specify either 32 or 128. No other values are permitted.
- maxjobs and maxqueued both equal two for class short, 4 for class normaldyn, and 6 for class long . The definition of these terms and the significance of this statement are described in the next two bullets.
- maxjobs is the maximum number of jobs in the Running state. Thus there can be no more than two jobs running simultaneously per user in class short. More specifically, maxjobs controls the number of jobs in any of these states: Running, Pending, or Starting; thus it controls a subset of what maxqueued (see next bullet) controls. The maxjobs keyword effectively controls the number of jobs in the Running state, since Pending and Starting are usually temporary states.
- maxqueued is the maximum number of jobs that can be queued at the same time, i.e. the maximum number of jobs that can be either running or being considered to be dispatched by the LoadLeveler negotiator. Jobs above this maximum are placed in the NQ (Not Queued) state. Thus there can be no more than two jobs queued simultaneously per user in class short. More specifically, maxqueued controls the number of jobs in any of these states: Idle, Pending, Starting, Running, Preempt Pending, Preempted, and Resume Pending.
Basic Info
- Notice that you should specify the full path to your executable, -exe /home/johndoe/app1/exe/app1.exe in this sample.
- Notes for # @ arguments above:
Specify -exe to provide full path to the executable (in this example app1.exe) for your parallel application that contains MPI calls, and specify -args to provide any arguments to that executable.
In this example, app1.exe has 2 arguments, the input file input.inp and the output file output.out .
Use -cwd to specify the current working directory of the job as seen by the compute nodes.
This example specifies execution process mode VN; CO is the default .
It is possible to run on fewer processors than the number available to your job. (The number available depends upon the size of your job's partition and whether you've specified CO mode or VN mode). See example 3 on the execution process mode page, it describes how to do this using -np. This practice is wasteful and you should only do it if there are legitimate reasons for doing so; and bear in mind you will be charged for all the nodes in your partition whether or not you use them.
To specify any special environment variables for your application that need to be in the environment presented to the job on the compute nodes, use
-env
to set the values of any required special environment variables in your batch job. For example:
-env "LOGNAME=/path/to/my.logfile DATAPATH=/path/to/my.datafile"
NOTE: Of course replace the names of the environment variables and their values in this example with whatever they are for your application!
- # @ notification = complete
# @ notify_user = johndoe@bnl.gov
in the Batch script above causes email to be sent to you upon job completion. No notification if not specified. Other options include error, start, never, and always, described in IBM LoadLeveler Documentation.
- On NY Blue, mpirun must be invoked in the batch job control file using the # @ executable statement, as shown in the sample above.
- To submit script: llsubmit app1.run
- To see a listing of the time left on all currently running LoadLeveler jobs: llqm
This is a user-contributed script located in /usr/local/bin; you shouldn't need to specify that if it is already in your path.
- Help for llq: llq -H
- To list all LoadLeveler jobs: llq -b (Then note jobid)
- Regarding the output from llq -b:
The meaning of the code in the LL column can be found in the table on the SUR Blue Gene page.
See /opt/ibmll/LoadL/full/include/llapi.h on the fen for the meaning of the PT and BG columns.
We believe typedef enum bg_partition_state_t in that file specifies the partition states that are displayed in the PT column, for example CF in the PT column means that the partition is configuring.
We also believe that typedef enum bg_job_state_t in that file (llapi.h) specifies the Blue Gene job states that are listed in the BG column.
If you know otherwise please send mail to bgwebmaster@bnl.gov .
- To delete a LoadLeveler job: llcancel jobid where jobid was determined by invoking llq -b.
- To display info as to why a jobid remains in the Hold, NotQueued, Idle or Deferred state: llq -ls jobid
- Type topdog to see current top dog jobs. This commands "lives" in directory /bgl/BlueLight/ppcfloor/bglsys/bin which should be in your path.
Low Priority Classes
- If your remaining node-hours for the month are not sufficient to run a job that you submit, the job filter will NOT permit your job to run, and it will display a message suggesting that you use one of the low priority queues (i.e. classes).
- The low priority classes are given lower priority than classes short, long, and normaldyn, but they at least allow you to continue to run jobs, though your wait time in a low priority queue could well be longer than if you were able to use its normal priority counterpart.
- The low priority queues are identical to the normal priority queues except they are lower priority, and their names have low appended.
- The low priority queues are named:
longlow : 72 hours maximum wall clock limit, 32 and 128 node jobs
normaldynlow : 48 hours maximum wall clock limit, 512 node jobs
shortlow: 24 hours maximum wall clock limit, 1024, 2048, 3072, and 4096 node jobs
- All of the three low priority queues have the same (low) priority.
- To see all the available classes on L, on fen type:
llclass
Multijobstep Jobs
- Job steps are defined and described in the LoadLeveler IBM Redbook . The first job step in a job is job step zero, not job step one. This is important to know when interpreting LoadLeveler error messages that refer to a particular job step.
- For jobs consisting of more than one job step, the node-hours required for the job to run is the sum of same over all of the job's job steps. If you do not have enough node-hours remaining in your allocation to run, a message to that effect will be displayed at submission time and the job will not be allowed to run.
- For jobs consisting of more than one job step, the maximum wall clock time allowed for the entire miltijobstep job is:
72 hours if the class specified for at least one of its job steps is class long.
48 hours if class long is not specified for any job step but class normaldyn is specified for at least one job step.
24 hours if neither class long nor class normaldyn are specified for any job steps (because this implies class short was specified, since there are only three valid job classes that can be specified for batch jobs).
In other words, the maximum wall clock time allowed for the entire multijobstep job is the wall clock limit for the longest wall clock limit class that was specified for any of the job steps.
Multijobstep jobs requesting more than these time amounts are not allowed to run and a message is displayed to the user at submission time.
- Most LoadLeveler keywords are inherited by proceeding job steps. Some however are not. See the "Job Command File Reference" chapter's list of job command file keyword descriptions in the IBM LoadLeveler Manual . For keywords that are not inherited, the description states so explicitly; otherwise the keyword is inherited.
Among the keywords not inherited is # @ bg_size, therefore you should specify this keyword in each job step of a multiple job step job.
- Job steps for which a wall clock limit is not specified or inherited are assigned a wall clock limit equal to that for the job step's class.
- Job steps specifying or inheriting a wall clock limit greater than that allowed for the job step's class are assigned the wall clock limit for that class.
- See the IBM LoadLeveler Manual for other LoadLeveler commands.
Back to Batch Job Submission
This site maintained by: bgwebmaster@bnl.gov
|
One of ten national laboratories overseen and primarily
funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven
National Laboratory conducts research in the physical, biomedical, and environmental
sciences, as well as in energy technologies and national security. Brookhaven Lab also
builds and operates major scientific facilities available to university, industry and
government researchers. Brookhaven is operated and managed for DOE's Office of Science by
Brookhaven Science Associates, a limited-liability company founded by Stony Brook University,
the largest academic user of Laboratory facilities, and Battelle, a nonprofit, applied science
and technology organization.
Privacy and Security Notice
|