Generalization of Hough PR to the real PHOBOS geometry

Since we saw the fact that the Hough PR actually works with the simplified geometry, it is natural to do the generalization of the Hough PR to the real geometry as my next step. Generation of new Hough Tables is quite easy task. But the design of new PR algorithm was not an easy task mainly because of missing layers. To best deal with the missing layers, we have to consider the match-sticks which jump over a missing layer.

Fortunately, we don't have any two consecutive missing layers. So for a given hit on a layer "i", we can always make a Hough match-stick from the layer to the next consecutive layer, "i+1", or to the next next layer "i+2".

With the simplied geometry, we had 4 combinations of layers, IJ, JK, KM, MO. Now we have 13 combinations, IJ, IK, JK,JL,KL,KM,LM,LN,MN,MO,NO,NP,OP.

Here are the tables.....
 


 


 


 

Measurement of Errors of Hough Table

Hough transformation means getting momentum and theta infomation from geometrical configuration of two associated hits or match stick. In other words, for a given match stick, we immediately obtain "P" and "theta". A question is naturally arised. With what precision? How much we can rely on the information. Since the Hough Tables are made by the ideal track trajectory (MULS=0 no multiple scattering, LOSS=4 average energy loss), for the realistic case, we must take the measurement error into account.

The simpliest way is making the same binned Hough tables and filling with the measured error and use them together with the Hough table.  Let's call it Hough Error Table. A problem is that we need a huge amount of CPU time to have enough statistics for all bins. Since bin to bin fluctuation of errors should not be big (I hope), one may use less coarse bin tables. The stratergy to get the Hough Error Table is

i) Generate N identical particles for a given theta and a momentum with MULS=1, LOSS=3, and fill the variation of 1/P or theta to the fine binning Hough Table(200x400).

ii) Do i) for the full range of theta(25-70 degree) with bigger step (2 degree), and for the full range of momentum (100MeV to 1GeV) with bigger step (10% error)

iii) make a coarse bin Hough Table(20x40). Calculate the average variation over 100 measurement in the fine bin table and fill to the coarse bin table.

The tables have the measured error of Hough components, ie, sigma(1/P),sigma(theta).


 


 

The size of Hough Tables

From now on I will simply call Hough Tables as the set of tables of 1/p, sigma(1/p), theta, sigma(theta). A Hough table for 1/p and theta uses a little bit less than 1 Mbytes (after the compression as it is the default feature of ROOT). Another table uses a half MegaBytes if I use fine bin table(200x400), but this can be reduced hugely by using of real coarse bin table (20x40).  In total, I think I will need 1Mbyte for a set of hough table. Considering the z-vertex dependency, from +10cm to -15cm, with a step or variable steps of about half centimeter(if we demonstrate succesfully the interpolation works fine), we will need 50 sets of Hough tables, ie. 50Mbytes, which is not a big value....

Track Reconstruction

In the simplified version, a track consists of 4 match sticks with a seed, i.e., track seed + IJ + JK + KM + MO.
With our complicated geometry, one should be flexible for the missing layers.  Track finding algorithm now starts from the layer "I" and try to find match-sticks with the hits on layer "J" which are consistent with a track seed in theta. If no match stick exists with "J", it tries with "K".

Then go to the layer "J", and do the  same job.  I.e., find match sticks with "K" if failed, with "L".

Finally a track candidates means a set of match sticks up to layer "O" or up to layer "P" + a seed.

With this algorithm, a track candidate has no two consecutive missing hits in layers by definition. According to our definition of the"findable" this is good feature!

I remind you the definition.

* Definition of "findable" track

No more than one missing hit in layers A-D
No more than one missing hit in layers E-J
No more than one missing hit in layers K-P

No two consecutive missing hits along the track.
------------------------------

I.e. it's not OK to miss plane D and then also plane E.

NOTE: If the track goes through a region where a layer  just doesn't exist, that is not a missing hit, it's a design "feature". If the track goes "outside" the detector in a given layer, that IS a missing hit.

I will apply the findable condition later (not now because it is not urgent matter).

Cuts for the minimization of CPU time and the optimized cut values.

With the track finding algorithm and the Hough tables, the actual CPU time to generated all the track candidates is a little less than INFINITY!  This is because we allow all the stupid combination of match sticks, and we consider all the hits even the hits are inconsistent with a given track seed in phi and theta and charge, etc.... Here I summarize all the cuts I invent to minimize the CPU time....

To do this job, I generated 8 different single track event samples. Each sample has 100 events.

200MeV pion+ @theta=28
300MeV pion+ @theta=38
400MeV pion+ @theta=44
500MeV pion+ @theta=45
500MeV pion- @theta=45
400MeV pion- @theta=46
300MeV pion- @theta=52
200MeV pion- @theta=62

These samples are chosen very carefully in order to explore all the difference spectrometer regions.

And I generated 100 events having the above 8 tracks per event, and a typical Au-Au interaction for final test.

Again the cone size for the track seed finding is 0.02 in eta-phi space.
 

For a given track seed,

Cut 1 (vertical hit position)

Extrapolate the track to a layer, and calculate "y" position (y_seed).  Compare the difference between y_hit and y_seed with its error, where the error is assumed a quadrature sum of PadSizeY/sqrt(12) and input Phi error.

| y_hit - y_seed | > sqrt( PhiError^2 + PadSizeY^2/12)

Here is the efficiency versus PhiError for 200MeV pion+.

0.03 looks like stringent, 0.05 is look like very conservative.

Cut 2 (charge correlation of hit horizontal position and the track seed direction)

Given the setup magnetic field, a positive track bends toward to the beam pipe. So the outer layer hits have smaller polar angles (theta) than those of the straight line part hits. For a negative track, this is reverse. Reconstruction is done by two steps according to the charge. So when we reconstruct the positive charged tracks, we need to consider only hits having smaller theta value than that of the given track seed. This is clear if you see the following picture.

The plot shows the distribution of "theta_layer_I "- "theta_seed" for the 8-track event sample. As one can see the charge correlation is clear.

(theta_layer - theta_seed)*charge > 0
 

Cut 3 (maximum allowed theta difference for hits on layer I)

As shown in the above plot, most of hits are in the range of +-0.04 because of the smallness of magnetic field up to the layer I. When generates the track candidate we only consider hits within the range.

|theta_layer_I - theta_seed|< 0.04

This cut is only to reduce the CPU time. One can do the same for the layer J, etc..

Cut 4 (Match-stick theta value should be consistent with the track seed)

After the cleaning  of hits by Cut1 and Cut2, Hough match-sticks are formed by hits from layer "i" with "i+1", or with  "i+2".  Match sticks are rejected if theta value is not consistent with  the track seed.

|theta_ms - theta_seed|>sqrt(sigma_theta_seed^2+sigma_theta_ms^2)
 
 

Where sigma_theta_ms is the measured value and sigma_theta_seed is optimized value after looking the above plot. sigma_theta_seed is set to 0.02.

Cut 5 (When chaining Match sticks, variation of theta value should be small)

When create the track candidates, i.e., chaining Match sticks, we should allow chaining them only when the variation of theta is small.

|theta_ms_i - theta_ms_j| > N x sqrt(sigma_theta_i^2 + sigma_theta_j^2)

where N is set to 2.

1.5 sigma looks like O.K.

Cut 6 (Momentum variation should be small)

The same cut as the Cut 5, but for momentum measurement.

| (1/P)_i - (1/P)_j | > N x sqrt( sigma_1/p_i ^2 + sigma_1/p_j^2)
 

2.5 sigma looks o.k.

Cut 7 (cuts not necessarily needed, but help to reduce the number of candidates about 10%)

 There must exist at least 3 hits in layers E,F,G,H between theta_s and theta_layer_I. This is almost true except for the case of bad multiple scattering.
 

 After all these cuts, I run the algorithm for a realistic Au-Au event. I have the following picture.
 

There are 16 findables with the follwoing findable definition which is a little bit different from the one decided in the meeting in February.

For layers AB,CD : at least 3 hits (i.e. one missing hit)
For layers EF,GH,IJ : at least 4 hits (1 hit can be lost due to a missing layer on G or H, and a real missing hit, so allow two missing hits )
For layer KL, MN, OP: at least 3 hits ( for central region , there are 4 layers only, allowing one missing hit)
and there is no two successive missing hits for two consecutive layers.

Here are the MC finable list:

Event 0 has 16171 MC tracks and 17440 MC hits

                               Q     P          theta      phi         #hit  #merged hits  # hits per layer
        findable   0: -1    0.285 51.967 -0.023      14      13     1111111011101021
        findable   1: -1    0.160 48.826 -0.013      14      13     1112111011011011
        findable   2: -1    0.848 44.553    0.015      17      13     3211111110101120
        findable   3: -1    0.179 30.837 -0.024      20      14     1211110131102212
        findable   4:    1    0.596 41.368    0.029      15      14     1112111111101101
        findable   5: -1    0.775 28.608 -0.035      13      13     1111110111101110
        findable   6: -1    0.262 54.139    0.050      16      14     1111211011111012
        findable   7: -1    0.662 34.956 -0.019      18      13     3211110112101120
        findable   8:    1    0.527 43.224 -0.007      21      15     3111111121202112
        findable   9:    1    0.820 48.220    0.033      16      13     2111111022101110
        findable 10:    1    0.896 53.732    0.019      16      13     1121121021101110
        findable 11: -1    0.422 53.090    0.022      14      12     1111111011201020
        findable 12:    1    0.363 33.260 -0.008      18      14     2111210111111022
        findable 13: -1    0.246 48.959    0.045      15      12     1111211012201010
        findable 14: -1    0.266 50.392 -0.043      14      13     1111111011101021
        findable 15: -1    0.377 62.825    0.049      16      14     1211111012111011
        Number of MC seeds & findables: 32 16

And I have 41 reconstructed seeds, and 461 track candidates.

Now a big question is that whether we have correctly reconstructed  the MC findables or not, in other words, whether we keep them in our 461 candidates, or we just created 461 ghosts.....

So I made a simple minded routine which select the best  mactching candidates to the MC findables.  & reconstructed ones. For a given track seed, I first find a MC findable which is closest to the seed in theta phi space. Then for this MC findables, I found a reconstructed track candidates which share maximally the hits.

After the selection (let's say the selection by MC truth information. Or theoretically the best selection algorithm).
I have this.

So I reconstructed 14 good tracks, and 2 lost, no ghost (by my eyes again). This means the efficiency of the algorithm is 0.875 and no ghost... This numbers are important because these are the best value we can obtain (because we are using perfect selection algorithm).

Next step

My next step is of cause finding a good algorithm to select maximally the MC findables. We may have to rely on chi2 cut from  track fitting and/or particle id information. I am planning to use both. We may be able to use just our pregenerated Templates for this (Themplates will let us know "p" and chi2 too). But I am worrying about CPU time then.... Let's discuss on Monday.