RCF details- Administrator Guide

Index

Phobos RCF Basic Structure

wpe2.jpg (59318 bytes)

 

Sending Data From DAQ to HPSS

Login to phobosx as phobsink (same password)

Files in  ~phobsink/bin

HPSS

RCF person to contact with HPSS problems is John Riodan   or Razvan Popescu

Accounts

All tapes are listed in a directory structure: There are two special accounts for working on HPSS, they are phobsink, and phobreco.
Phobsink is the account specifically set up to deal with sinking data (in this account you can delete raw data !! So only use to do so!). Phobreco is the account specifically setup to deal with the reconstruction of the raw data (i.e. input raw output dst Class of Service (COS)). This account you can not delete raw data but can delete reconstruted data. (General rule: Cna delete anything user owns: phobsink owns raw data, phobreco owns reconstructed data)
When you log in as phobreco or phobsink (see below how to) you will be in /home/phobsink (phobreco)

Access to shell in HPSS

To access the HPSS directory system (tape database in file format):

Getting a listed output file: So can look at all files you have

Directory Structure

/home/phobreco ->/home/phobsink   (soft link => that directory stored on tape has name /home/phobsink even if saved as /home/phobreco)

/home/phobsink/cr00                           Commissioning Run  2000 data
/home/phobsink/pr00                           Physics Run 2000 data

How to monitor HPSS

RCF Web page->System Statistics->HPSS  (http://www.rhic.bnl.gov/RCF/Software/HPSS/HPSSStatistics1.html)

There are two things important to monitor

  1. The status of the disk cache, to see how full it is getting.
       Disk Storage Class Info, then click on Phobos Raw Disk or Phobos Dst Disk
  2. The information on amount of tape available and used etc.
       Tape Cartridges Info
    Select  List the cartridges which PVR or/and StorageClass match the one(s) you select.
    SD-3 == Readwood
    9840 == Eagle (old)
    9940 == Eagle (new)

CRS (Central Reconstruction Farm)

Contact person for problems Tony Chan or (Thom Throw)

This is were the raw data is processed. It has direct access to HPSS.

How to monitor.

User Scripts

Located in /phobos/u/phobreco/bin
submit_job.pl

NFS Mounted Disk

The disk is NFS mounted on phobmds (There are 2 quad processor Sun OS machines (rmine401, rmine402)).

The disk is currently divided in the following way:

Name Size (GB) Description
/phobos/u 63 User files (not data files)
/phobos/common 5 Standard version for Phobos (realeased version Phat+root+geant etc)
/phobos/data 10 old data stuff
/phobos/data01 781 $SCR,Disk2000-temporary large storage area
/phobos/data02 315 $DV_OUT (data validation output info), SIMFILES (simulation runs)
/phobos/data03 621 Reserved for users with large storage requirements

Perfmeter

perfmeter is a monitoring tool to be used on the sun machines to monitor the load on the machines serving the disks

How to make it work:
Whats important/ what do they mean.

For nfs served over a gigabit connection, the maximum transfer rate is 50 MBytes/sec   (ftp max is ~100MBytes/sec).
So if packet size is 1.5K => max packet/sec=33Kpackets/sec.
If transfer size (= size of block that nfs serves) = 64K => max transfers/sec=780 transfers/sec (However most blocks much less than 64K => max rate seen as of 8th March 2500 transfers/sec)

CPU=Percent of CPU being used (is 100% max if all 4 processors working at 100% each)

Packets = Ethernet packets transfered in and out per second (Packet size=1.5K) Maximum = 33K/sec

Disk= Disk traffic in transfers/sec (Transfer=~64K or less i.e. the average block size transfered) Maximum=2500

Load= Average number of runable processes over the last minute. Number of jobs requesting CPU usage. (can be ~ upto number of cpus you have/jobs have running)

Cleaning the $SCR area

George has a script for cleaning scratch area

To see how much disk users have on /phobos/u

Outputs list onto temp.txt, Users should have between 1-1.5 GB.

For users to change permissions on the data disks, chmod does not propagate, so you need to do the following:

  1. Login to phobmds
  2. Type: 
        setfacl -m g::rwx DIRNAME    (g=group; rwx what ever permissions you want)
  3. Type:
        setfacl -m mask:rwx DIRNAME

To see result of change type; getfacl DIRNAME

Gateways

RCF contact person about gateways is Shigeki Misawa.

Descriptions of gateway machines

Three types of gateway machines exist

  1. Standard Linux entry point, for most users:    rsshgw.rhic.bnl.gov    (via ssh1)
                                                                        rssh2gw.rhic.bnl.gov   (via ssh2)  
    These can represent more than one machine. They allow access to the rmds01, phobmds, rcrsuser, and rcas machines. Has no nfs mounted disks accessible. LSF login (to choose least loaded machine)
  2. Large data file transfer entry point + web server for phobos phat web pages.
    Linux based, has all the disks NSF mounted. Can allow ssh + ssh protected ftp.
  3. Mail server, rcf.rhic.bnl.gov   (Advise not to use as overloaded) Note: For user to use CRS, they need to submit jobs from this machine
       

How to LSF Login

How to ssh protected ftp

How to login to cas nodes using X-term connected to rcf2

  1. Login to rcf2.rhic.bnl.gov
  2. On gateway machine, ssh rsshgw.rcf.bnl.gov
  3. Then into linux gateway, and ssh into cas node as normal

Note: Use authorization keys to get password less entry into nodes from rcf. To see how Goto http://www.rhic.bnl.gov/RCF/Software/Commercial/SSH/using_ssh/using_ssh.html
Note: .shosts will not work to do this

CVS Respository

The cvs respository is owned by the user phobos. It is located in /phobos/u/phobos/Repository

Trouble shooting the respository

  1. LOCK File problem:
    Description: While user trying to make changes to CVS, connection lost, and can leave a lock file in a directory so that other users can not use it.
    Solution: Login as phobos, and goto the directory where the lock file is. There will be two files
                   #cvs.lock
           #cvs.wfl.rcas4XXX.rcf.bnl.gov:{Port#}
    Remove both these files, and cvs will work again.
  2. Directory Creation Problem:
    Description: When user makes a new directory in Repository, that user owns it. But it needs to be owned by phobos
    Solution: Ask RCF (Maurice Askinazi) to chmod that directory and files to phobos.
                   

The directory /phobos/u/phobos/Repository/Models is where the phobos copies of RHIC generators are stored.

Swapping CAS and CRS nodes

Needs to be discussed with RCF

CAS farm

The farm consists of 25x2 800 MHz machines (rcas4010-4035) + 10x2 400 MHzMachines (rcas4001-4010).

 

Common Code

The common code, is located in /phobos/common.

Root:

Root has the label /phobos/common/root_vxxx, and within each of these directories there are two directories
Linux  and SunOS. To download the new versions of each of these files you need to

  1. Download the appropriate version of root in appropriate directory
  2. Login  to Sun machines (phobmds), ensure ROOTSYS=/phobos/common/root_vXXXX/SunOS (by using setroot `pwd`)
    and then compile the root version. (Note: do not use simlink of root in compile as it needs full pathname!)
    by typing
             configure Solarius cc5
             make
  3. Login  to Linux machine, ensure ROOTSYS=/phobos/common/root_vXXXX/Linux (by using setroot `pwd`)
    and then compile the root version. (Note: do not use simlink of root in compile as it needs full pathname!)
    by typing
            configure Linux egcs
            make
  4. Update the ROOT symlink, ROOT->root_vXXXX, this way all paths will get handled approprately.

Phat:

Phat-latest == latest phat that successfully compiled

On home directory of rcas4004, crontab script submits a lsf batch job that tests the current phat version.
To access modify script, ssh -l phobos rcas4004
The script crontab executes testground (shell script) which does

  1. Checkout phat
  2. Build phat
  3. Sees if compiled correctly
  4. If not compile, logs result and mails person
  5. If compile ok, sets up html page
  6. Copies itself into Phat-latest

Released versions also avialable

OpenGL

/phobos/common/packages

Oracle

Copy of oracle all on disk that is maintained by RCF- pointed to by $ORACLE_HOME

If the address of db should change, you need to modify the following file

$ORACLE_HOME/network/admin/tnsnames.ora
Under phdb:...Host="phobosdb.chm.bnl.gov", change this to new name

Insure

Type insure, to make work.

Setup Code

Location /phobos/common/bin

Setup Environmentals

In all .logins, have line
                                  eval `phobos_setup tcsh`

How this works:

phobos_setup == Wrapper around perlenv, takes shell as agrument, and default environmental file = /phobos/common/etc/phobosrc
perlenv (Shell Type) (Environmental File)
The environmental file has listings of all environmental variables, in shell independent way, and perlenv converts this file into shell specific way of defining environmentals.
Inside the /phobos/common/etc/phobosrc file the format is
ENV    {Name Environmental Variable} {What is set it to}
PATH  +{Path added at front of $PATH}
PATH  {Path added at end of $PATH}                
UMASK {Sets default mask of files to be rwxr--r--}
LIMIT   {Command to limit users abilities}
TIMEOUT {Time after which force logout}

Note the idea is the perlenv returns statements that need to be evaluated in the current shell, so become globally useable!

Setup Alias

In all .cshrc have
                            eval `phobos_alias tcsh`

How this works

phobos_alias == Wrapper around perlenv, takes shell as agrument, and default environmental file = /phobos/common/etc/phobosalias
perlenv (Shell Type) (Environmental File)

This contains aliases like setroot, setphat, setinsure For example
alias setphat 'eval `setpkg phat ($pwd) CSH`'   note setpkg calls perlenv
setpkg does:

  1. Looks at package , and determines what has to be change
  2. Then what to change them to
  3. Outputs in form perlenv will make useful in shell independent way

Eg: phat, looks at ROOTSYS and PHATHOME, outputs changes in form thatr are necessary.
If need a new enviromnetal in root/phat  done in setpkg.

Print Queue

printtrans - Prints transpariencies on peacock : located in /phobos/common/bin
Files it uses are in /phobos/common/etc/lowertray and midtray

LSF

Monitoring lsf batch system

xlsmon:
Menu-View-Move all but phobos nodes to unselected
                  -Detailed load    -CPU queue length (How many jobs want to use CPU in given time period)==load unnormalised
                                            -Utilization: how much of CPU used (upto 100%)
                                            -Paging rate + I/O rate (linked)
                                            -Swap Space (max ~ 1GB)
                                            -Avialable Memory (max ~ 256x2 MB (450 after kernel))

                 -History of load  - Graph of above quantities over time  (to number in brackets = axis scale)
                                            - Memory leak, meamory will slowly decrease!! (suspend job and email person)

Command line:

bjobs -u phobos_gr     : Tells nunber of jobs running etc  (-s number suspended)
bhosts pho_cas            

Administrator can request of Maurice to have ability to kill/suspend other users jobs.

To set LSF queue parameters request of Maurice.

Alternative program to look at lsf working status is /usr/local/lsf/bin/xlsadmin.
If problem with getting nodes restarted etc , Maurice requests to inform Tony Chan or (Thom Throw)

Sun Compilers

The Sun compilers are located on ITD machines, believe to be Sun3:

/opt/Sun* is code that is used on the Sun
/opt/WS5.0 is where ther compiler 5.0 version is located