Data Validation Modules: A how to use guide

Introduction

The datavalidation (DV) modules are to be written and maintained by the experts on each software "subsystem". Their purpose is to calculate the information that summarises the data and quality of the data of the particular sequence that is to be output to the DV Tree.

How does Data Validation Work:

During Pass0/Data Validation, supermodules are run, attached to each supermodule will be at least on DV module.i.e.

Pass0 supermodule/module list

Supermodule 1   - module 1a
                          - module 1b
                          - module 1c
                          - DV module 1
Supermodule 2   - module 2a
                          - module 2b
                          - DV module 2
and so on....

    The job of each DV module is to caculate the parameters it need etc so that at the end of the seqeunce it can calculate all the parameters to be output into the DVTree. (There is a DVTree created in TPhDVTreeMakerMod, that deals with the DVTree i.e. writing it out etc )
At the end of the sequence, the DVTree with the parameters/hists from each of the supermodules is written out to a file in $DV_OUT, called DVTree_[Time]_[Date]_[Run#]s[Seq#]_ONLV_X_Y_Z.root
This output tree file will have the following structure:
wpe1.jpg (14901 bytes)

To summarize: There is a branch with the start time information, then for each DV module there is one banch for all the Int_t and Float_t and Double_t variables, and a single branch for each histogram (TH1 or TH2 or TH3) output.
Once all such files are created for a given run, then the data validation gui is run on them. This involves chaining the DVTrees together of a particular run, and then running the function in each of the DVModule, GetHistfromTrees(Tree,ObjArray,ReferenceObjArray). This function is writen by the author of the module, and it takes the tree, and makes the histograms and graphs that they want to be looked at during the datavalidation process, and outputs them in the ObjArray. Then the data validation gui displays them and if they look good, the run will be validated. Sometimes a direct comparison to reference histograms is required, the 3rd arguement is an object array of reference histograms etc.

How to write a DV Module:

There is an example: The module in $PHATHOME/sigproc/src/TPhSiHitDVMod.cxx. This module calculates quantities to be output from SiHit processing. It calculates variable values and histograms.
To see how this works there is a macro, in $PHATHOME/macros/dataval/make_dv_tree_example.C, this takes a raw data file, and uses TPhSiHitDVMod and TPhDVTreeMakerMod, to output a DVTree file. This macro is run like hitproc.C, it is assumed that the environmental DATA is pointing to the directory above the directory in which the raw data file is sitting.:
Example how to run it on rcas, with raw data file in $SCR/DATAFILES/PhoRaw005734s001.root

  1. rcas4001> setenv DATA $SCR  {ENTER}
  2. rcas4001>cd $PHATHOME/macros/dataval {ENTER}
  3. rcas4001> phat {ENTER}
  4. root[0] .x make_dv_tree_example.C(5734,1,"DATAFILES") {ENTER}
  5. This will run and at end quit phat, and you should have the data file "DVTree_081802_20000823_005734s001_ONLV_0_0_0.root"
Special feature of the DVModule:

    In terms of writing the information to a tree you have two choices:

  1. You can do it yourself
  2. Let the DV module create the branches etc for you. (Prefered method)

If you want to use choice (2), then all you need to do, is in the .h file, in the comment line of the variables you want put into the tree you put "$dv". This will work only for Int_t, Float_t, Double_t, TH1[X] *, TH2[X]*, TH3[X] *. Then in your StartRun you just call SetUpDefaultBranches(), and it will automatically work. Naming conventions are descibed in the comments in the classes.

How to look at the trees


You can just look at the trees as you would any tree, or there is an example macro : $PHATHOME/macros/dataval/look_at_dv_tree_output.C that shows you the use of the GetHistFromTrees() member function. This macro runs the GetHistFromTrees() member function, on some test files in $DV_OUT.
Example how to run look_at_dv_tree_output.C on rcas

  1. rcas4001> cd $PHATHOME/macros/dataval {ENTER}
  2. rcas4001>phat {ENTER}
  3. root [0] .x look_at_dv_tree_output.C
  4. you will see a canvas pop up with 4 plots
Special feature of TPhDVModule, in GetHistFromTree(t,a,a_ref):

    In terms of reading the information from a tree, you have two choices:

  1. You can do it yourself
  2. For histograms only, you can use SetHistogramBranches(t). This will automatically set the branches of the histograms to their corresponding data member pointers. When you do t->GetEvent(i), these data member pointers will automatically be set to the histogram in event i.

Additional details are found in the comments of the class and macros

Reference Histograms/Functions

    For each DVModule, there should be a corresponding set of reference histograms/graphs/functions made, and put into a file with the name <DVModuleName>_ref.root. This file should then be commited to cvs in the directory RefHists. For example. You have made a root file, Si_Raw_Error_Ref.root.

  1. Make sure you have the cvs directory RefHists checked out of cvs. (i.e. in some area type cvs co RefHists, and it will download the histograms)
  2. Put your root file into the RefHists directory
  3. Add and commit your file (i.e. if it is new type
    cvs add -m "Your comment here" Si_Raw_Error_Ref.root
    cvs commit -m "Your comment here" Si_Raw_Error_Ref.root

    if it is already there, a commit will just update the version)
What do I put in the reference file:

Inside each of these files should exist an TObjArray(a_ref) of Histograms, Graphs, or functions that you want for comparison to the histograms that GetHistsFromTrees() returns, or the histograms the GetHistFromTrees(t,a,a_ref) sees sequence by sequence and does a comparison on.
    The names of the hist/graph/functions in the object array should be the same as the ones that are displayed with additonal tag saying reference. i.e. For example with the TOF DV module
        There is a histogram named "hOKEeffTB"
Any reference histogram/graph/function associated with this histogram would be named "hOKEeffTB_Ref_1", "hOKEeffTB_Ref_2" etc... The numbers are because you may want difference types of reference histograms/functions (some to be overlayed, some to be directly compared etc..)

Note: Some plots you might want to compare to lines/functions, therefore you should store the appropriate function in the reference object array . For example, in the straight line tracking we have a graph
    tree->Draw("fRatio5to6Hits_P:SeqNumber","","goff");
    TGraph *gRatio5to6Hit_P= new TGraph(tree->GetSelectedRows(),tree->GetV2(),tree->GetV1());

    and you want the efficiency to lie between 0.7->0.9. Then you would create the functions (limits=0->999, max and min possible sequence numbers)
TF1 *lowerbound= new TF1("fRatio5to6Hits_P:SeqNumber_Ref_1","0.7",0,999);
TF1 *upperbound= new TF1("fRatio5to6Hits_P:SeqNumber_Ref_2","0.9",0,999);

These would then be added to your refence object array.

Automatic checking of graphs/histograms

The base class, $PHATHOME/phatutil/src/TPhDVAutoChecker, defines the interface that is required by the DV Gui. It has the ComparisonResult variable, which gives the result of the check/comparison, and a virtual function Compare(), which must be overwritten in the derived classes, which fills the ComparisonResult variable. Via this interface, DV can display, the result of the automatic check. The specific type of check you want to do, should derive from this class (please make them generally applicable as possible).For example, there is a class $PHATHOME/phatutil/src/TPhDVRangeCheckGraph, which derives from the the TPhDVAutoChecker, and takes a graph, and two functions (upper and lower), and the Compare() function determines if the points in the graph are between the two functions. If you have need for other types of comparisons, please write your class in the same way. This class should contain the hist/graph you want to check, + the refences you used to check them  against, so that the DVGui, can use use them for displaying the original information.
There is a complete example of how this works, on the SiRawErrors:
$PHATHOME/sigproc/macros/make_raw_error_ref.C    : This macro makes a reference file, with upper and lower                                                                                              bounds(TF1) for each error type.
$PHATHOME/macros/dataval/make_dv_tree_example.C : This is a macro that makes a tree, with the SiRawErrors in it.
$PHATHOME/macros/dataval/look_at_dv_tree_sirawerror.C : This macro, reads a tree with SiRawErrors in them, + the reference object array (a_ref ) from file (Object array of TF1 upper and lower functions, one for each error type), and then Runs TPhSiRawErrorDVMod::GetHistsFromTrees(t,a,a_ref), and returns into a, TPhDVRangeCheckGraph objects, that have the original graph of "Number of Error X:seqnumber" + the upper and lower bounds functions, + the result of the comparison. This is then displayed.

How to select good runs, using dv information in data base

The phat macro $PHATHOME/macros/dataval/good_runs.C retrives the run and sequence list of runs that match the cirteria that you determine.

How to run it:

  1. cd $PHATHOME/macros/dataval
  2. phat 
  3. .x good_runs.C
  4. Enter the DVVersion that you are looking for
  5. Enter if you want to select using the status flag or not
  6. For each DV quality item (i.e. Trigger, SiHitProc) eneter whether you want to use it to select the run and sequences. If you do want to use a specfic DV quality , it will ask what criteria you want to select upon (0=not validated, 1= failed dv, 2=no decision in dv, 3=passed dv)
  7. A listing of the runs and sequences that match you crteria is displayed. Also a txt file, "select.txt" is generated, with the listing as well as the conditions under which they were selected, as well as the sql statement that was used. 

How to make backup of $DV_OUT files

In $DV_OUT there is a file list.pl, this can be used to make tar files for each DV Version. This is done by

  1. Make sure you are login into rmine401, as phobreco (otherwise will take forever)
  2. edit list.pl
  3. at the top make my $onlvchoose1="{ONLV_XX_YY_ZZ}"
  4. cd $DV_OUT
  5. ./list.pl (run as phobreco)

This outputs into the $DV_OUT area 

log_ONLV_XX_YY_ZZ_Set0.tar, err_ONLV_XX_YY_ZZ_Set0.tar, objmgr_ONLV_XX_YY_ZZ_Set0->N.tar (each file <2 GB), trgtree_ONLV_XX_YY_ZZ_Set0.tar,dvtree_ONLV_XX_YY_ZZ_Set0.tar, infotree_ONLV_XX_YY_ZZ_Set0.tar

then, as phobreco

  1.  pftp hpss.rcf.bnl.gov 2121
  2. site setcos 16
  3. psocket
  4. bin
  5. prompt off
  6. cd /home/phobreco/dv_out_backup
  7. mkdir ONLV_XX_YY_ZZ
  8. mput *ONLV_XX_YY_ZZ*.tar
  9. When  finished transfering, check the sizes are ok, and delete the tar files