Latchup Protection in Phobos Front -End Electronics

-- Summary of discussions and conclusions

from meeting held at BNL, 11:00 to 13:30, Thursday 2 September 1999

Wadsworth, MIT.LNS.EF 8Sep99

Attendees:

Birger Back, Alan Wuosmaa, Rachid Nouicer, Piotr Kulinich, Mark Baker,

Gerrit van Nieuwenhuizen, Pradeep Sarin, Bolek Wyslouch, Heinz Pernegger,

Alan Carroll, John Fitch, Miro Plesko, and Bernie Wadsworth.

Agenda:

1. What do we know?

. latchup phenomenon in CMOS and associated causes

. observations from engineering run incl. post-run/dismantling evaluations.

. input from other experiments

. relevant measurements and vendor data

2. What do we do?

. hardware protection schemes

. operational proposals

. external needs

Discussions:

What do we know?

Latchup phenomenon and causes

Bernie presented a description of the N-well, CMOS process (as used in the VA chip) and indicated the parasitic junction transistors which are inherent in these structures. The parasitic transistors are a vertical PNP and a lateral NPN which together constitute a circuit with positive feedback. If the loop gain in this parasitic circuit is greater than 1, then it is possible for the circuit to be triggered in much the same way that the PNPN structure of a silicon controlled rectifier (SCR) can be turned on. This 'latchup' results in a current flow (of a few hundred milliamps) from Vdd to Vss, and the flow can be stopped only by turning off power to the circuit.

Latchup can be triggered by ...

Signals outside the Vdd - Vss range (by about 0.6V or more) due to ...

. poorly terminated transmission lines -- reflections on driver lines.

. control signals outside range.

. electrostatic discharge (ESD) at an I/O pad

Excessive dVdd/dt causing displacement currents in wells/substrate

Radiation-induced currents in the wells/substrate

With precautions taken in IC design, both at the process level and at the layout level, chips are generally resistant to latchup but not to the extent that they are immune to the effects of localized, heavily ionizing radiation.

Latchup occurs between Vdd and Vss. As the engineering run and other experience indicates, it is the dVdd bond (the one supplying Idd to the digital section of the chip) which is blown suggesting that latchup is confined to the digital section of the chip. It is believed that the analog structures, which are more demanding in performance, have been treated more carefully in a way which also makes them more resistant to latchup. In the analog section of the chip, the input P-FET, which because of its large area might be considered to have good cross-section for latchup, is actually rather more resistant to latchup: the voltage swing at the base of the vertical PNP transistor required to turn on this parasitic structure is more than four times that required at the base of the parasitic transistors in the more common structures on the chip which have the P-FET's source and the N-well both tied to Vdd.

The dVdd bond is the more susceptible to fusing because it takes the whole current associated with latchup, whereas the chip's large area back contact (tied through a via to the hybrid's Vss plane) shunts a portion of the current flowing to Vss, and makes the current flowing through the dVss bond somewhat less than the current flowing the dVdd bond . Also, in the layout design of the Phobos hybrids, the dVdd bondwire happens to be longer than the dVss bondwire, which means that the former fuses at a lower current compared to the latter.

Observations from the Phobos engineering run:

Several chips latched-up during the run. Generally it was found that by switching off the power and then turning back on, the affected chips could be brought back to life. Also it was found that the amount of time one needs to wait before turning power back on seems to vary quite widely from a few minutes to 'overnight' (no-one at the meeting had an explanation why this should be so). Moreover, the turn-off/turn-on procedure was not universally successful, and in post-run analysis it was found that on two chips the dVdd bond wire had fused; after rebonding it was found that both chips had survived..

Heinz reported that on hybrids connected to the Phobos hybrid test fixture, which has no overcurrent protection, dVdd bond wire was fused accidentally on several chips; some of the affected chips survived, some died.

Regarding the experience from the engineering run, discussion turned to the variation observed in the ±2V supply voltages. On occasions, voltages somewhat lower than nominal -- as low as 1.75V in one instance -- were observed.*

 

 

___________________________________________________________________________

* Bernie's comment: we need to know more about this and understand the cause. We would welcome additional information concerning voltages measured, measurement points, and associated circumstances to aid us in the analysis of this problem.

 

 

The ±2V supply voltages, measured at the port connector, should not deviate more than 1% from their 'factory' preset values. The figures for the production batch of 30 power boards we recently tested are ...

Typical : - 2.007V +1.936V

Extreme values observed in batch (120 ports): - 2.010 to - 1.997 V +1.923 to +1.947V

Note: we can redimension one or two resistor values to make the actual +2V output closer to +2.00V, but we have not done this yet.

Input from other experiments:

Heinz presented info gathered from his experience in testing VA hybrids and from discussions with other experimenters who are using the VA chip.

Known reasons for latchup in Viking chips include ...
. radiation-induced latchup which depends on the magnitude and rate at which energy is transferred to the chip

. control signal induced latchup: when control signals arrive at the chip with a larger amplitude than the rail voltages.

. estimated latchup current to kill the affected chip: 0.5 to 1A

ALICE physicists have made a study of latchup in front-end chips and conclude that 1 in 106

neutrons can produce a latchup, which in their experiment results in one latchup every 100secs (they have 10x the number of Phobos chips)

AMS physicists are concerned about latchup and intend to measure the latchup cross-section for their VA-HDR4 chips at an ion beam facility in Brussels during Sept/Oct '99. They have offered to measure cross-sections for our chips, VA64-HDR1, VA-HDR1, during the same period if we can provide them with test hybrids for this purpose.

Relevant measurements:

Fusing current for 17μm dia bonding wire: wirebonds were installed on a prototype M-substrate between the gold bonding pads of the thickfilm circuit and the gold back contact pad to which the VA chip would normally be attached. Thus the bonds are very close in length to those in the final hybrid assembly. There are three different bond lengths in the layout which, in plan, measure ...

1mm -- 'short' (e.g., GND bond wire, and dVss bond wire),

1.6mm (e.g., dVdd bond wire)

and 2mm -- 'long' (e.g., 'test_on' bond wire)

The corresponding static fusing currents were measured ...

for short bonds as 630 < I < 670mA,

for long bonds as 460 < I < 510mA.

For comparison, data obtained from the bond wire manufacturer, American Fine Wire, show a fusing current of 250mA for Al/Si 1% wire, 17μm dia and 10mm long, in air -- which is consistent with our measurements.

Our tests also indicated that in some cases the bond, after drawing heavy current, was still in place and appeared to be intact, but actually had infinite resistance. We concluded that such a bond had been turned into aluminum oxide. The bond has a mottled dark appearance under these conditions.

What do we do?

Hardware protection schemes.

Any protection sheme at the very minimum should prevent the wirebonds from blowing.

In addition, the latchup current should be quenched as quickly as possible since, besides the wire bonds, the Vdd and Vss power buses internal to the chip are being stressed. Even though the wire bond might be protected from blowing, serious damage could be caused in these internal buses. Also such (repeated) current surges are not likely to improve the chip's performance over time.

There are basically two proposals on the table at the present time: one proposed by Piotr which uses resistors to provide current limiting during latchup; the other proposed by John/Bernie which senses the onset of increased current associated with latchup, turns off the ±2V regulators on the power board, and crowbars their outputs. Piotr was reluctant to offer more details about his proposal for resistive current limiting, and he referred us instead to all the e-mail he had sent previously to the collaboration. For the remainder of the meeting, the discussion focussed on the second approach.

Proposal for 'sense-and-turnoff' approach.

Local protection:

John Fitch showed a simplified circuit diagram of this approach. In each port of the power board, we sense the dV/dt across the inductor in the output filter of the +2V supply, and when this exceeds a predetermined threshold, the regulators for both +2V and -2V in the port are turned off and two FETs shunting their respective output lines are turned on to crowbar the outputs. Initial bench tests indicate we can sense a sudden increase of as little as 100mA increase and can have the regulators off and outputs crowbarred within 5μsec. It was noted that turning off the ±2V supplies also removes the control signals from the hybrid thus avoiding another possible cause of latchup. It was also noted that false triggers are very unlikely since both comparator inputs are heavily bypassed by closely spaced capacitors.

Our initial proposal involved sensing only Idd but shutting off both +2V and -2V supplies. To provide additional coverage we can also monitor Iss. Piotr pointed out that the scheme as described does not detect slow increases in supply current. At this suggestion we will examine the possibility of also covering this situation; but monitoring for excessive DC currents is made quite difficult by the 3:1 range in supply currents required in the different ports throughout the system. For example, in a port feeding ...

Type 1 module (12 x 128-ch VAs) -- Iss = -900mA; Ignd = 750mA; Idd = 150mA

Outer vertex detector module (8 x 64-ch VAs) -- Iss = -300mA; Ignd = 250mA; Idd = 50mA.

A global reset signal distributed over twisted pair and photo-optically coupled to all power boards enables the protection circuit(s) to be reset after an appropriate delay. Those ports still operating normally would ignore this reset signal.

 

Global considerations:

In addition to the localized protection afforded by the additions to the power board described above, this proposal also provides for global emergency shutdown of the system. In the event of a failure in the RHIC control system which results in the beam or beam-related ions travelling down the outside of the beampipe, we propose that an appropriate detection system generate a global signal to shutoff within 5 μsec the ±2V supplies on all power boards through the system, and eventually all DC power. Under errant beam conditions, it would be prudent to remove all DC power to the system -- not only to protect the VA chips close to the beam but also the many ICs (incl. the Xilinx chips) in the front-end controllers not very far removed from the beam. It was noted that, since the ±5V supplies power the Vbc regulators on the power board, the low voltage DC power units must not be turned off without first carefully removing Vbc, and that this process -- at 5V/sec -- could take as much as 40secs. The global shutdown signal would be broadcast to the system on twisted pair(s) and photo-optically coupled to each power board.

Comparing the two hardware protection proposals:*

The concept of limiting the latchup current using the resistor approach has appeal because of its relative simplicity, but in practice it is difficult to place the resistor where it will be effective:

. Placing a limiting resistor on the hybrid in series with the dVdd connection appears to be attacking the problem at its source; however this alone does not solve the problem: dropping the +2V on the chip without at the same time lowering the control signals only introduces another condition for latchup. Moreover, its implementation would be especially difficult now that most of the sensors are attached to these hybrids.

. Placing a series resistor at the output of the regulators degrades the performance of the regulators and is especially difficult given the range of currents required in different ports across the system. We take as a matter of principle that all power boards will be identical and not tailored to their specific load. One is attempting to guard against latchup currents in the 100mA to 500mA range. Limiting with a resistor in the +2V line alone does not provide protection: in latchup, the Vdd would sag towards Vss and on its way down be caught by the normally reverse biased rectifier at the output of the +2V supply. This rectifier is one of two devices (the other is across the -2V regulator output) required to guarantee fault-free startup of the regulators at power turnon. With little more than 0.6V drop across Vdd-Vss required to sustain latchup in a chip, the rectifier would provide an alternative path needed for a large latchup current to keep flowing. This points up the need for providing limiting in both Vdd and Vss. But because of the variability of the load on Vss (-300mA to -900mA), it is almost impossible -- even with the freedom to customize each port of the power board -- to dimension a series limiting resistor for the -2V output which will limit the current in latchup but not interfere with normal operation.

___________________________________________________________________________* This comparison was not formally discussed at the meeting, but it is provided here for the sake of completeness. Piotr should feel free to rebut.

 

 

 

 

 

. Placing the resistor ahead of the regulator degrades the performance of the regulator, and it presents similar problems to those mentioned above. Plus, with the limiting resistor buried behind even more bypassing, there might well be enough stored energy in all the downstream bypassing to fuse the dVdd bond before the resistor's limiting action can be effective.

. There is one further consideration particular to the design of the VA chip. In the great majority of CMOS chips Vdd is operated at +5V (or, in more advanced technologies, at +3.3V) and Vss is connected to ground. The VA chip on the other hand has three power rails: Vdd at +2V, Ground, and Vss at -2V. If, due to resistive current limiting, the Vdd line of the chip sags towards Vss and actually drops below ground, the vertical PNP parasitic transistor associated with large input P-FET is turned on and generates a latchup condition in this region of the chip. To prevent this happening the Vdd rail should never be allowed to go lower than ground -- a condition which is difficult to guarantee with simple resistor limiting.

The sense-and-turnoff approach is somewhat more complicated than the resistor approach, but it is nonetheless feasible. Also ...

. It involves only additions to the power board; the hybrids and FEC signal board remain unchanged.

. It senses the latchup condition early in the process and snuffs it out promptly to prevent serious damage to the chip or wirebonds.

. It brings both rails to ground, and at the same rate, to avoid creating conditions which would otherwise provoke or prolong latchup.

Operational proposals:

Alan W. et al. expressed concern that, with global shutdown of the system, the amount of uptime on the system might rapidly approach zero. However, it was argued that a significant number of ions on the outside of the beampipe is sufficiently serious that one assumes RHIC will have knowledge of what is causing the condition and will take steps to rectify the situation as quickly as possible.

Dealing with shutdowns on particular ports requires that ..

a. we be able to recognize that protection has been invoked.

b. we be able to re-activate the port at a later time.

Gerrit proposed that the activation of protection be signalled by a bit in the datastream (this would require a significant change on the signal board). Others including Bolek and Heinz countered that the activation would be obvious enough in the data stream: with the ±2V off all the chips' bias voltages, which are being monitored on an event-by-event basis and reported in the data stream, would drop to midrange. This might not be noticed for 200μsecs or so, but this was thought not to be a great disadvantage -- once noticed and signalled by the slow control system, the data from the previous 200μsecs would be labelled as suspect.

The suggestion was made by Bolek that, for reactivating a port, a signal be issued from the Xilinx -- this to allow specific, affected ports to be turned back on after the necessary elapsed time. A global reset would not allow appropriate differentiation between ports that failed some time ago and ports which failed recently. Bernie pointed out that this will involve messy changes on the signal board -- something we would like to avoid at this stage, but agreed to look into the feasibility of this Xilinx reset.

 

External needs:

For the local protection system ...

. A response from the slow controls system indicating that protection has been activated in a particular port.

. A response from the on-line system which selectively resets the protected port.

 

For the global protection system ...

. Suitably dimensioned scintillator detectors upstream and downstream of the IR, with output signals thresholded and timed to detect particles passing the IR at low angle in the clockwise and/or counter clockwise direction. The output from this detector feeds the signal generator in the counting room which sends a global set signal to all power boards to turn off and crowbar the ±2V supplies.

. A response from the on-line system to slowly ramp down Vbc in all ports.

. A response from the slow control system which, after Vbc has reached 0V, turns off all

LVDC power units and Vbc bias supplies.

 

 

 

 

 

 

 

The meeting adjourned at 13:30 with instructions to further develop the sense-and-turnoff approach.