Last Updated on Wednesday, May 24, 2000 at 17:49:43. by Author: Nigel George
The LOG_ID effectively becomes the JOB_ID, as it uniquely identifies the JOB run
This table gives each job a unique key = LOG_ID.
ACTION | LOG_ID | LOG DATE |
DATA_VALIDATION | 1 | 12:35-1/1/00 |
DATA_VALIDATION | 2 | 12:55-1/1/00 |
would add DATA_VALIDATION Action, and any others as time goes on (eg: Production Pass1 etc..)
ACTION | DESCRIPTION |
DATA_VALIDATION | Data validation started |
This table associated run and sequence numbers to those files for which this makes sence. The FILE_ID is a unique reference of a file.
FILE_ID | RUN_NUMBER | SEQ_NUMBER |
10 | 2900 | 10 |
15 | 2900 | 11 |
This table defines which of the files are inputs, the rest are assumed to be outputs in the table of files.
LOG_ID | FILE_ID |
1 | 10 |
2 | 15 |
Make assumption that FileNames do not change in any movement of a file.The file id uniquely defines the file.
(Creation date = unix date file created, MD5_SUM= checksum i.e. for when copy etc..)
FILE_ID | LOG_ID(creation) | FILE_TYPE | FILE_NAME | CREATION_DATE | MD5_SUM |
10 | 0 * | RAW_DATA | PhoRaw002900s010.root | 17:30-12/31/99 | 100 |
11 | 1 | HIT_DATA | PhoHit002900s010.root | 12:36-1/1/00 | 123 |
12 | 1 | OM | PhoOM002900s010.root | 12:36-1/1/00 | 122 |
13 | 1 | STD_LOG | Date20000101_Time154543_Set1.log | 12:36-1/1/00 | 89 |
14 | 1 | STD_ERR | Date20000101_Time154543_Set1.err | 12:36-1/1/00 | 90 |
15 | 199* | RAW_DATA | PhoRaw002900s011.root | 18:00-12/31/99 | 101 |
(*see table PHOFFLINE.RUNSEQ)
Note: One problem with scheme, if run over very many files (i.e. making a phi dst from many runs/seq), then this table will become huge, i.e. 1000's entries for each job, so it wont work this way. Can just make policy decision that these types of runs, will have to be handled differently. (Also could somehow link a FILE id with a FILE_TYPE=multiple, to the list of FILES, and then only get one entry for the list of inputs removing problem..)
There is an additional table that specifies file location (also, if deleted etc..)
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
10 | 0 | HPSS | phobsink/cr00/data | 0 |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
12 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
13 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
14 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
15 | 199 | HPSS | phobsink/cr00/data | 0 |
The status variable is 0 if current file. If file is deleted it is incremented by one, for each time the file was deleted. This way can one can tell if the file was deleted or not, one also has a history of files used.
What can happen to files, and how it fits into scheme.
a) Move a file: Simply change SYSTEM and Directory
b) Delete a file: Note, you want a record of the file you did run on, even though
deleted as could have been used further down the chain..
Before remove this file
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
After remove this file
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | +1 |
c) Copy a File:
To copy a file, you will require to do it via a special script that assigns to that copy a LOG_ID, and checks the MD5_SUM matches. (This is ok, because these are special files, ones that are tracked, hence they have special rules apply to them, else choas..)
Before copy this file
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
After copy this file
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 0 |
11 | 123 (copy job) | HPSS | phobsink/cr00/data | 0 |
Now if want to delete original file, use again a special script that now changes status flag
FILE_ID | LOG_ID (creation) | SYSTEM | DIRECTORY | STATUS |
11 | 1 | RCF_NFS_DISK | /phobos/data02/dv_out | 1 |
11 | 123 (copy job) | HPSS | phobsink/cr00/data | 0 |
This table effectively would be filled etc by Patrick with the $ Parameter scheme.
(do we need locations of modules relative to PHATHOME?)
LOG_ID | CVS_VERSION_NUM | MODULE_CLASS | MODULE_NAME | PARAM_NAME | PARAM_TYPE | PARAM_VALUE |
1 | 1.3 | TPhInitGeoMod | geoinitmod | fGeometryName | CHAR | 42.600 |
1 | 1.1 | TPhInitFECMod | fecinitmod | fGetFromDB | INT | 1 |
1 | 1.1 | TPhInitFECMod | fecinitmod | fSensorMapName | CHAR | |
1 | 1.7 | TPhHitProcInitMod | hitproc | fGetFromDB | INT | 1 |
1 | 1.7 | TPhHitProcInitMod | hitproc | fCalibrationRunNumber | INT | 2889 |
1 | 1.7 | TPhHitProcInitMod | hitproc | fCalibrationFileName | CHAR | PhoGain_2889s000.root |
1 | 1.7 | TPhHitProcInitMod | hitproc | fPedRunNumber | INT | 2898 |
1 | 1.7 | TPhHitProcInitMod | hitproc | fPedName | CHAR | ped |
1 | 1.7 | TPhHitProcInitMod | hitproc | fGainName | CHAR | gain |
2 | 1.1 | TPhInitGeoMod | geoinitmod | fGeometryName | CHAR | 42.600 |
This table keeps a record of exactly what Phat and macro within that phat version were used. Note: I have considered as a matter of policy, that only Phat jobs with Phats and macros that have come from CVS are to record their information in the database. Otherwise it is impossible to track.
LOG_ID | PHAT_VERSION | MACRO_NAME (rel PHATHOME) |
1 | ONLV_1_0_0 | sigproc/macros/data_validation.C |
2 | ONLV_1_0_0 | sigproc/macros/data_validation.C |
This table is so that you know exactly what files/info were used from the database. For the ped/gains etc, i.e. any job that falls within this structure (assuming pedestal jobs will be logged in a similar way etc), then you can specifically reference the log_id of that job. However the implementation of even this George has stated is quite difficult and requires major thought/work. The issue of things read in that were not created with a log_id, becomes more difficult...
LOG_ID | TYPE | INPUT_LOG_ID |
1 | PED | 23 |
1 | GAIN | 24 |
2 | PED | 28 |
This table has machine/job details
LOG_ID | SYSTEM_RUN_ON | STOP_TIME | CPU_TIME (sec) | END_JOB_STATUS |
1 | CRS | 13:35-1/1/00 | 1200 | NORMAL |
2 | CRS | 13:55-1/1/00 | 1100 | CRASH |
Table specfically for the infomation from Data Validation Jobs
LOG_ID | EVENTS_PROCESSED |
1 | 6000 |
2 | 6000 |
Could have additional columns as more DV specfic info needed.
This table contains the quality information.
LOG_ID | CHECKNAME | QUALITY |
1 | DV_AUTO | GOOD |
1 | DV_MANUAL | UNSURE |
2 | DV_AUTO | BAD |
3 | DV_AUTO | GOOD |
3 | DV_MANUAL | GOOD |
4 | DV_AUTO | UNSURE |
How to handle refinements in quality??
LOG_ID (creation) | RUN_NUMBER | SEQ_NUMBER | FIRST_EVENT | LAST_EVENT |
0 | 2900 | 10 | 30001 | 36000 |
199 | 2900 | 11 | 36001 | 42000 |
This table is purely for the raw data, it is in essence filled by the DAQ for each run/seq file it makes.
This table's purpose is if know the run number, you can find all the log_ids relevant to a given run number.
LOG_ID | RUN_NUMBER |
1 | 2900 |
2 | 2900 |