Instructions how a user can use CRS

(Note: If mistakes/problems with instructions please let me know: Nigel)

Introduction

CRS is the phobos central reconstruction farm. It is primarily responsible for taking the raw data and converting it into hits. Its secondary functions are to run simulations etc. The primary function ALWAYS has priority over any other function of CRS.

Only one user, "phobreco", can submit jobs to CRS. Therefore, for it to be run with multiple users, a high degree of control must be exercised on the users who want to use it. Therefore please do as requested, in the following documents, otherwise you could end up destroying the work of your collegues and lossing your privilege to use CRS.

Description

  1. Get permission to run on the CRS farm (contact nigel@kiwi.chm.bnl.gov, at least a full working day in advance). You should be allocated a number of nodes and priority. Every time you want to run on CRS you must get permission to do so.
    (There will be a page set up to manage this, but still working on that!)
  2. Setup to submit jobs in your own area (/phobos/u/<USERNAME>)  see How to setup jobs in your own area
  3. To submit a job/s, logon to rsshgw.rhic.bnl.gov as "phobreco" ( obsolete logon to rcf as "phobreco" obsolete)
  4. logon to rcrsuser as "phobreco" i.e. at prompt type
    > ssh rcrsuser
  5. Goto /phobos/u/<USERNAME>/phobreco/input/jobfiles
  6. Type: submit_job_linux_queue.pl -l <USERNAME> -n <NumberNodesToUse> -p <Priority> -q <Queue Number>.<ENTER>
    (obsolete: submit_job_linux.pl -l <USERNAME> -n <NumberNodesToUse> -p <Priority> .<ENTER>)
    (obsolete submit_job.pl -l <USERNAME> -n <NumberNodesToUse> -p <Priority> .<ENTER> obsolete)
    This submits <NumberNodesToUse> of the jobfiles in the subdirectory /submit to the CRS farm and  sends  email to <USERNAME>@rcf.rhic.bnl.gov to let you know it has been submitted. It also moves that jobfile to /submitted.
    (When each job is finished, it will take the next jobfile in /submit and submit it etc )
    Note: Remove jobfilename~ from the submit directory, else it will try to submit these also.
    Note: The <priority> will be assigned to you. In most cases it will be 100.
    Note: The <Queue Number>: There are 3 queues which  determine the speed of the machines you will be runing on, Queue=3 has fastest machines, Queue=1 the slowest. The queue number determines which set of machines your job will run.
  7. To monitor the submission of your scripts, type: /usr/crs/bin/tk_CRS_status_awc.pl & <ENTER>
    This pops up a window that shows the status of submitted jobs etc (Note: Must continue to hit refresh to update)
  8. When your jobs are finished, you will recieve email informing you of if the job/s so sucessfully completed or not.
    And the next script in submit will be submitted, until the /submit directory is empty.

 

How to setup jobs in you own area

  1. In your local user account (/phobos/u/<USERNAME>), setup the following directory structure
    wpe18.jpg (16613 bytes)
  2. The jobfile, is the file that you submit to the farm to run you job.(via typing: submit_job.pl <USERNAME>, this in turn submits the jobfiles that are located in submit directory). It is the control script that gets everything set up to run on a CRS node, i.e. stages files on HPSS, tells it what control script to run on node etc. (For how to make jobfiles etc, you need to use the jobmanager , click here for description)
    A typical jobfile file is shown

    executable=/phobos/u/nigel/phobreco/input/scripts/controller_script
    executableargs=2105,0
    inputdir[0]=/phobos/data01/temp
    inputfile[0]=PhoRaw002105s000.root
    inputstreamtype[0]=UNIX
    inputdir[1]=/phobos/u/nigel/phobreco/input/macros
    inputfile[1]=Phat_env.C
    inputstreamtype[1]=UNIX 
    inputdir[2]=/phobos/u/nigel/phobreco/input/macros
    inputfile[2]=SiRawToHitModDefaults.dat
    inputstreamtype[2]=UNIX
    inputnumstreams=3
    mergefactor=1
    notify=phobreco@rcrsuser.rcf.bnl.gov
    outputdir[0]=/phobos/data01/temp
    outputfile[0]=PhoHit002105s000.root
    outputstreamtype[0]=UNIX
    outputnumstreams=1
    stdoutdir=/phobos/u/nigel/phobreco/output/log
    stdout=Logfile002105.out
    stderrdir=/phobos/u/nigel/phobreco/output/err
    stderr=Error002105.err
    (osolete notify=phobreco@rcf.rhic.bnl.govobsolete)
  1. Interpretation of jobfile: (Details about jobfiles)
        executable=The controller script that is run on the node. It sets up environmental variables on that node,
                        and then executes the macro you want to run.
                        Example:
                                        #! /bin/tcsh
                                        eval `/phobos/common/bin/phobos_setup tcsh`
                                         eval `/phobos/common/bin/phobos_alias tcsh`
                                        setphat /phobos/u/nigel/Phat
                                        phat -b -q "/phobos/u/nigel/phobreco/input/macros/sirawprocess_mod_batch.C($2,$3)"
              
        executableargs=The arguments to be passed into the macro, the first one is $2, second is $3 etc...
        in/outputdir[],in/outputfile[],in/outputstreamtype[]= Define the in/output directory,filename,and file type (UNIX or HPSS)
        in/outputnumstreams = Number in/output streams
        mergefactor= (don't know, set =1)
        notify=the email address of where the node should send information that it succeeded/failed  when job completed
          (This goes to phobreco, from where it is forwarded to the address you specify when you type submit_job.pl <USERNAME>)
       stdoutdir,stdout=Directory and filename of file containing standard output (i.e.what you print to the screen)
       stderrdir,stderr=Directory and filename of file containing standard error (i.e.what you directed to stderr, i.e. cerr<<)
  2. You must ensure the directories and their files have the following group privleges otherwise "phobreco" can not use them
    /phobreco/output/err & /phobreco/output/log                                            are group readable,writable and executable
    /phobrco/input/macros & /phobreco/input/scripts                                       are group readable and exectuable
    /phobreco/input/jobfiles/submit & /phobreco/input/jobfiles/submitted          are group readable,writable and exectuable
    Check this with the comand ls -l, and look in the privelege field, eg :-rwxrwx--- shows group readable,writable and exectable.
  3. You are set up, once you have a jobfile/s in /phobreco/input/jobfiles/submit, a controller script in /phobreco/input/scripts, a macro/s in /phobreco/input/macros.

TroubleShooting

This is difficult to do, but start simple, and work your way up to the complete macro you want to work.