Using LSF in Phobos on the RCAS cluster
LSF is a large, complicated batch management system, and I cannot describe
it completely here. The RCF maintains documentation on the
system as a whole, and you are encouraged to read it to discover LSF's subtleties.
However, for the beginner, a few basics are listed here.
You do not have to make any changes to your .cshrc/.login files. If you
are using the Phobos standard setup, everything will be
configured correctly for you to use LSF.
You must be logged in to one of the RCAS machines in order to use LSF.
- To create a batch job, make a normal executable UNIX script (in a file, say my_jobfile)
to control your job. For example:
#!/bin/tcsh
#Do phobos standard setup.
#Do NOT set LD_LIBRARY_PATH
#Any setenv path commands should have '$path' in them,
#so as not to over-write the path set automatically by RCF at login
eval `/phobos/common/bin/phobos_setup tcsh`
eval `/phobos/common/bin/phobos_alias tcsh`
#Use my own version of phat
setphat $HOME/Phat
# Run phat
phat my_script(some parameters)
rcas4001> bsub my_jobfile -o outfile.log -e outfile.err
- Users are supposed to save jobs in a log and error file. You can see above the usage of
'-o outfile.log -e outfile.err as flags to the bsub command.
- You can check on the status of your job by using the command:
rcas4001> bjobs
- There are many useful flags to LSF. A few are mentioned here, more can be found
discussed in the documentation:
- -Is : Add this flag to your bsub in order to run a job interactively (the 's' stands for
shell mode support, which you will usually need for interactive jobs). This lets you use
lsf and still see the output interactively.
- -L /bin/tcsh : Makes the job run in the tcsh, alternately can use -L /bin/csh for c-shell
- -q phcas_med : Makes the job run in the phcas_med queue
Specific queue details
(User job limit, time limit (in CPU-hours, not actual hours), and job priority)
- phslow_hi: job limit 10, 1 CPU-hour time limit, high priority
- phslow_med: job limit 45, 24 CPU-hour time limit, medium priority
- phslow_lo: job limit none, no time limit, low priority
- phslow_mc: job limit none, no time limit, lowest priority, MC only
- phcas_hi: job limit 10, 1 CPU-hour time limit, high priority
- phcas_med: job limit 70, 24 CPU-hour time limit, medium priority
- phcas_lo: job limit none, no time limit, low priority
- phcas_mc: job limit none, no time limit, lowest priority, MC only
- phcasfast_hi: job limit 10, 1 CPU-hour time limit, high priority
- phcasfast_med: job limit 45, 24 CPU-hour time limit, medium priority
- phcasfast_lo: job limit none, no time limit, low priority
- phcasfast_mc: job limit none, no time limit, lowest priority
- phcrs_hi: job limit 10, 1 CPU-hour time limit, high priority, MC only
- phcrs_med: job limit 70, 24 CPU-hour time limit, medium priority
- phcrs_lo: job limit none, no time limit, low priority
- phcrs_mc: job limit none, no time limit, lowest priority. MC only
- phobos_int (rcas4001-4015): job limit 3, 1.5 CPU-hr time limit, low priority
- phobos_all (all machines): job limit 30, NOT FOR general use - distributed disk uses this
Queue Notes:
-
For summer 2003, production repasses will be done on phcas and phcasfast, so jobs running
on those queues may be superceded by production. However, the new, fastest phcrs queues will
be available for analysis and Monte Carlo.
-
When we start taking data (in the winter 2003/2004), the phcrs queues will be used for new
data production and the phcas queues left alone for analysis (depending on the DAQ rate).
- To check CRS job status, go here
-
For more what machine/what speed/what queue, go to the
"What machines can I use" page
- For more information on LSF queues availability and use (somewhat old document)
go here
Fancy job scripts
- The script autosubmit.pl (~belt/autosubmit.pl - I got it from "some other phobos user")
exists to submit jobs to different queues, with the expected directory setup as follows:
- ~username/lsf/lsfcontrol/autosubmit.pl
- ~username/lsf/submit/ -- in here are job scripts to submit to LSF
- ~username/lsf/submitted/ -- in here are job scripts which have already been submitted
- Note that all job scripts should be executable. Use man chmod to learn how to make files executable
- There is also a script floating around which uses fcat to access information about what files are on distributed disk. This script
is not supported, and use of it is highly discouraged, as the script does not play nice with database access. This can have serious
effects on data taking and analysis.
This page is randomly maintained by M.B. Tonjes