Last Updated on Wednesday, May 24, 2000 at 17:49:43. by Author: Nigel George

Data Tracking Proposal Pass 2

The LOG_ID effectively becomes the JOB_ID, as it uniquely identifies the JOB run

Table PHOFFLINE.LOG_ENTRIES

This table gives each job a unique key = LOG_ID.

ACTION LOG_ID LOG DATE
DATA_VALIDATION 1 12:35-1/1/00
DATA_VALIDATION 2 12:55-1/1/00

Table PHOFFLINE.LOG_ACTIONS

would add DATA_VALIDATION Action, and any others as time goes on (eg: Production Pass1 etc..)

ACTION DESCRIPTION
DATA_VALIDATION Data validation started

Table PHOFFLINE.RUNSEQ_FILES

This table associated run and sequence numbers to those files for which this makes sence. The FILE_ID is a unique reference of a file.

FILE_ID RUN_NUMBER SEQ_NUMBER
10 2900 10
15 2900 11

Table PHOFFLINE.INPUTS

This table defines which of the files are inputs, the rest are assumed to be outputs in the table of files.

LOG_ID FILE_ID
1 10
2 15

Table PHOFFLINE.FILES

Make assumption that FileNames do not change in any movement of a file.The file id uniquely defines the file.

(Creation date = unix date file created, MD5_SUM= checksum i.e. for when copy etc..)

FILE_ID LOG_ID(creation) FILE_TYPE FILE_NAME CREATION_DATE MD5_SUM
10 0 * RAW_DATA PhoRaw002900s010.root 17:30-12/31/99 100
11 1 HIT_DATA PhoHit002900s010.root 12:36-1/1/00 123
12 1 OM PhoOM002900s010.root 12:36-1/1/00 122
13 1 STD_LOG Date20000101_Time154543_Set1.log 12:36-1/1/00 89
14 1 STD_ERR Date20000101_Time154543_Set1.err 12:36-1/1/00 90
15 199* RAW_DATA PhoRaw002900s011.root 18:00-12/31/99 101

(*see table PHOFFLINE.RUNSEQ)

Note: One problem with scheme, if run over very many files (i.e. making a phi dst from many runs/seq), then this table will become huge, i.e. 1000's entries for each job, so it wont work this way. Can just make policy decision that these types of runs, will have to be handled differently. (Also could somehow link a FILE id with a FILE_TYPE=multiple, to the list of FILES, and then only get one entry for the list of inputs removing problem..)

Table PHOFFLINE.FILE_LOCATION

There is an additional table that specifies file location (also, if deleted etc..)

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
10 0 HPSS phobsink/cr00/data 0
11 1 RCF_NFS_DISK /phobos/data02/dv_out 0
12 1 RCF_NFS_DISK /phobos/data02/dv_out 0
13 1 RCF_NFS_DISK /phobos/data02/dv_out 0
14 1 RCF_NFS_DISK /phobos/data02/dv_out 0
15 199 HPSS phobsink/cr00/data 0

The status variable is 0 if current file. If file is deleted it is incremented by one, for each time the file was deleted. This way can one can tell if the file was deleted or not, one also has a history of files used.

What can happen to files, and how it fits into scheme.

a) Move a file: Simply change SYSTEM and Directory

b) Delete a file: Note, you want a record of the file you did run on, even though deleted as could have been used further down the chain..
Before remove this file

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
11 1 RCF_NFS_DISK /phobos/data02/dv_out 0

After remove this file

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
11 1 RCF_NFS_DISK /phobos/data02/dv_out +1

c) Copy a File:

To copy a file, you will require to do it via a special script that assigns to that copy a LOG_ID, and checks the MD5_SUM matches. (This is ok, because these are special files, ones that are tracked, hence they have special rules apply to them, else choas..)

Before copy this file

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
11 1 RCF_NFS_DISK /phobos/data02/dv_out 0

After copy this file

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
11 1 RCF_NFS_DISK /phobos/data02/dv_out 0
11 123 (copy job) HPSS phobsink/cr00/data 0

Now if want to delete original file, use again a special script that now changes status flag

FILE_ID LOG_ID (creation) SYSTEM DIRECTORY STATUS
11 1 RCF_NFS_DISK /phobos/data02/dv_out 1
11 123 (copy job) HPSS phobsink/cr00/data 0

 

Table PHOFFLINE.MODULES

This table effectively would be filled etc by Patrick with the $ Parameter scheme.

(do we need locations of modules relative to PHATHOME?)

LOG_ID CVS_VERSION_NUM MODULE_CLASS MODULE_NAME PARAM_NAME PARAM_TYPE PARAM_VALUE
1 1.3 TPhInitGeoMod geoinitmod fGeometryName CHAR 42.600
1 1.1 TPhInitFECMod fecinitmod fGetFromDB INT 1
1 1.1 TPhInitFECMod fecinitmod fSensorMapName CHAR
1 1.7 TPhHitProcInitMod hitproc fGetFromDB INT 1
1 1.7 TPhHitProcInitMod hitproc fCalibrationRunNumber INT 2889
1 1.7 TPhHitProcInitMod hitproc fCalibrationFileName CHAR PhoGain_2889s000.root
1 1.7 TPhHitProcInitMod hitproc fPedRunNumber INT 2898
1 1.7 TPhHitProcInitMod hitproc fPedName CHAR ped
1 1.7 TPhHitProcInitMod hitproc fGainName CHAR gain
2 1.1 TPhInitGeoMod geoinitmod fGeometryName CHAR 42.600

Table PHOFFLINE.PHATJOBS

This table keeps a record of exactly what Phat and macro within that phat version were used. Note: I have considered as a matter of policy, that only Phat jobs with Phats and macros that have come from CVS are to record their information in the database. Otherwise it is impossible to track.

LOG_ID PHAT_VERSION MACRO_NAME (rel PHATHOME)
1 ONLV_1_0_0 sigproc/macros/data_validation.C
2 ONLV_1_0_0 sigproc/macros/data_validation.C

Table PHOFFLINE.DATABASE_KEYS

This table is so that you know exactly what files/info were used from the database. For the ped/gains etc, i.e. any job that falls within this structure (assuming pedestal jobs will be logged in a similar way etc), then you can specifically reference the log_id of that job. However the implementation of even this George has stated is quite difficult and requires major thought/work. The issue of things read in that were not created with a log_id, becomes more difficult...

LOG_ID TYPE INPUT_LOG_ID
1 PED 23
1 GAIN 24
2 PED 28

Table PHOFFLINE.JOBS

This table has machine/job details

LOG_ID SYSTEM_RUN_ON STOP_TIME CPU_TIME (sec) END_JOB_STATUS
1 CRS 13:35-1/1/00 1200 NORMAL
2 CRS 13:55-1/1/00 1100 CRASH

Table PHOFFLINE.DV_JOBS

Table specfically for the infomation from Data Validation Jobs

LOG_ID EVENTS_PROCESSED
1 6000
2 6000

Could have additional columns as more DV specfic info needed.

Table PHOFFLINE.JOB_QUALITY

This table contains the quality information.

LOG_ID CHECKNAME QUALITY
1 DV_AUTO GOOD
1 DV_MANUAL UNSURE
2 DV_AUTO BAD
3 DV_AUTO GOOD
3 DV_MANUAL GOOD
4 DV_AUTO UNSURE

How to handle refinements in quality??

Odd tables that are useful

Table PHOFFLINE.RUNSEQ

LOG_ID (creation) RUN_NUMBER SEQ_NUMBER FIRST_EVENT LAST_EVENT
0 2900 10 30001 36000
199 2900 11 36001 42000

This table is purely for the raw data, it is in essence filled by the DAQ for each run/seq file it makes.

 

Table PHOFFLINE.RUN_NUMBERS

This table's purpose is if know the run number, you can find all the log_ids relevant to a given run number.

LOG_ID RUN_NUMBER
1 2900
2 2900