Last Updated on Wednesday, May 24, 2000 at 17:49:43. by Author: Nigel George

Data Tracking Proposal Pass 2

The LOG_ID effectively becomes the JOB_ID, as it uniquely identifies the JOB run

Table PHOFFLINE.LOG_ENTRIES

This table gives each job a unique key = LOG_ID.

ACTION	LOG_ID	LOG DATE
DATA_VALIDATION	1	12:35-1/1/00
DATA_VALIDATION	2	12:55-1/1/00

Table PHOFFLINE.LOG_ACTIONS

would add DATA_VALIDATION Action, and any others as time goes on (eg: Production Pass1 etc..)

ACTION	DESCRIPTION
DATA_VALIDATION	Data validation started

Table PHOFFLINE.RUNSEQ_FILES

This table associated run and sequence numbers to those files for which this makes sence. The FILE_ID is a unique reference of a file.

FILE_ID	RUN_NUMBER	SEQ_NUMBER
10	2900	10
15	2900	11

Table PHOFFLINE.INPUTS

This table defines which of the files are inputs, the rest are assumed to be outputs in the table of files.

LOG_ID	FILE_ID
1	10
2	15

Table PHOFFLINE.FILES

Make assumption that FileNames do not change in any movement of a file.The file id uniquely defines the file.

(Creation date = unix date file created, MD5_SUM= checksum i.e. for when copy etc..)

FILE_ID	LOG_ID(creation)	FILE_TYPE	FILE_NAME	CREATION_DATE	MD5_SUM
10	0 *	RAW_DATA	PhoRaw002900s010.root	17:30-12/31/99	100
11	1	HIT_DATA	PhoHit002900s010.root	12:36-1/1/00	123
12	1	OM	PhoOM002900s010.root	12:36-1/1/00	122
13	1	STD_LOG	Date20000101_Time154543_Set1.log	12:36-1/1/00	89
14	1	STD_ERR	Date20000101_Time154543_Set1.err	12:36-1/1/00	90
15	199*	RAW_DATA	PhoRaw002900s011.root	18:00-12/31/99	101

(*see table PHOFFLINE.RUNSEQ)

Note: One problem with scheme, if run over very many files (i.e. making a phi dst from many runs/seq), then this table will become huge, i.e. 1000's entries for each job, so it wont work this way. Can just make policy decision that these types of runs, will have to be handled differently. (Also could somehow link a FILE id with a FILE_TYPE=multiple, to the list of FILES, and then only get one entry for the list of inputs removing problem..)

Table PHOFFLINE.FILE_LOCATION

There is an additional table that specifies file location (also, if deleted etc..)

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
10	0	HPSS	phobsink/cr00/data	0
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	0
12	1	RCF_NFS_DISK	/phobos/data02/dv_out	0
13	1	RCF_NFS_DISK	/phobos/data02/dv_out	0
14	1	RCF_NFS_DISK	/phobos/data02/dv_out	0
15	199	HPSS	phobsink/cr00/data	0

The status variable is 0 if current file. If file is deleted it is incremented by one, for each time the file was deleted. This way can one can tell if the file was deleted or not, one also has a history of files used.

What can happen to files, and how it fits into scheme.

a) Move a file: Simply change SYSTEM and Directory

b) Delete a file: Note, you want a record of the file you did run on, even though deleted as could have been used further down the chain..
Before remove this file

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	0

After remove this file

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	+1

c) Copy a File:

To copy a file, you will require to do it via a special script that assigns to that copy a LOG_ID, and checks the MD5_SUM matches. (This is ok, because these are special files, ones that are tracked, hence they have special rules apply to them, else choas..)

Before copy this file

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	0

After copy this file

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	0
11	123 (copy job)	HPSS	phobsink/cr00/data	0

Now if want to delete original file, use again a special script that now changes status flag

FILE_ID	LOG_ID (creation)	SYSTEM	DIRECTORY	STATUS
11	1	RCF_NFS_DISK	/phobos/data02/dv_out	1
11	123 (copy job)	HPSS	phobsink/cr00/data	0

Table PHOFFLINE.MODULES

This table effectively would be filled etc by Patrick with the $ Parameter scheme.

(do we need locations of modules relative to PHATHOME?)

LOG_ID	CVS_VERSION_NUM	MODULE_CLASS	MODULE_NAME	PARAM_NAME	PARAM_TYPE	PARAM_VALUE
1	1.3	TPhInitGeoMod	geoinitmod	fGeometryName	CHAR	42.600
1	1.1	TPhInitFECMod	fecinitmod	fGetFromDB	INT	1
1	1.1	TPhInitFECMod	fecinitmod	fSensorMapName	CHAR
1	1.7	TPhHitProcInitMod	hitproc	fGetFromDB	INT	1
1	1.7	TPhHitProcInitMod	hitproc	fCalibrationRunNumber	INT	2889
1	1.7	TPhHitProcInitMod	hitproc	fCalibrationFileName	CHAR	PhoGain_2889s000.root
1	1.7	TPhHitProcInitMod	hitproc	fPedRunNumber	INT	2898
1	1.7	TPhHitProcInitMod	hitproc	fPedName	CHAR	ped
1	1.7	TPhHitProcInitMod	hitproc	fGainName	CHAR	gain
2	1.1	TPhInitGeoMod	geoinitmod	fGeometryName	CHAR	42.600

Table PHOFFLINE.PHATJOBS

This table keeps a record of exactly what Phat and macro within that phat version were used. Note: I have considered as a matter of policy, that only Phat jobs with Phats and macros that have come from CVS are to record their information in the database. Otherwise it is impossible to track.

LOG_ID	PHAT_VERSION	MACRO_NAME (rel PHATHOME)
1	ONLV_1_0_0	sigproc/macros/data_validation.C
2	ONLV_1_0_0	sigproc/macros/data_validation.C

Table PHOFFLINE.DATABASE_KEYS

This table is so that you know exactly what files/info were used from the database. For the ped/gains etc, i.e. any job that falls within this structure (assuming pedestal jobs will be logged in a similar way etc), then you can specifically reference the log_id of that job. However the implementation of even this George has stated is quite difficult and requires major thought/work. The issue of things read in that were not created with a log_id, becomes more difficult...

LOG_ID	TYPE	INPUT_LOG_ID
1	PED	23
1	GAIN	24
2	PED	28

Table PHOFFLINE.JOBS

This table has machine/job details

LOG_ID	SYSTEM_RUN_ON	STOP_TIME	CPU_TIME (sec)	END_JOB_STATUS
1	CRS	13:35-1/1/00	1200	NORMAL
2	CRS	13:55-1/1/00	1100	CRASH

Table PHOFFLINE.DV_JOBS

Table specfically for the infomation from Data Validation Jobs

LOG_ID	EVENTS_PROCESSED
1	6000
2	6000

Could have additional columns as more DV specfic info needed.

Table PHOFFLINE.JOB_QUALITY

This table contains the quality information.

LOG_ID	CHECKNAME	QUALITY
1	DV_AUTO	GOOD
1	DV_MANUAL	UNSURE
2	DV_AUTO	BAD
3	DV_AUTO	GOOD
3	DV_MANUAL	GOOD
4	DV_AUTO	UNSURE

How to handle refinements in quality??

Odd tables that are useful

Table PHOFFLINE.RUNSEQ

LOG_ID (creation)	RUN_NUMBER	SEQ_NUMBER	FIRST_EVENT	LAST_EVENT
0	2900	10	30001	36000
199	2900	11	36001	42000

This table is purely for the raw data, it is in essence filled by the DAQ for each run/seq file it makes.

Table PHOFFLINE.RUN_NUMBERS

This table's purpose is if know the run number, you can find all the log_ids relevant to a given run number.

LOG_ID	RUN_NUMBER
1	2900
2	2900