Recently, when I fired up my Challenge L, I was greeted with a power fault on the -5, -12 V rails, and the system shut down again.
Now, the power system on these systems is a combination of one (or more) OLS supplies that convert AC into 48VDC, and a lot of converter bricks that convert 48VDC into whatever is needed. This means that unlike the PowerSeries before them you don't need bus bars to supply some 200A @5V to the system. It also means there are a lot of DC-DC converters that can break: three per IP25 CPU board, one for every MC3, several 505 and 512 modules on the backplane, plus a handful of 1V5 regulators left and right. Oh, and the system controller has a 1V5 source, it's own 5VDC and -14.5VDC converters.
All of this is controlled and monitored by the system controller. As the CHALLENGE™/Onyx™ Diagnostic Road Map put it:
Quote:When the system is turned on, the power subsystem goes through a series of voltage checks before the boot process is allowed to start. Power is applied to the various system components in the following order: +/-5 V and +/-12 V power bricks (power for the SCSI drives in the deskside systems), 1.5 V and 3.3 V power bricks, 5 V and 12 V power bricks (power for the first internal SCSIBox in the rackmount systems), and 5 V and 12 V for external SCSI. This power sequencing is designed to prevent component damage due to incorrect or missing voltages, and to avoid placing a large transient demand on the voltage source.
The system controller has a display attached to show status of the power supplies. On a good day, it may show something like this:
I wondered in the past how come it can display
the 5V value when there are quite a few independent 5V supplies in the system. Was it showing the one most out of spec maybe? It turns out it's a little different, and you have to read between the lines of the Diagnostic Road Map to figure it out:
- The system controller has analogue and digital voltage monitoring capabilities.
- The values on the display of the system controller are the analogue values measured: 48VDC (the OLS), 5 and 12 VDC (the 512 module supplying the IO4), 1.5V (the converter on the system controller), and the -5 and -12V rails on the IO4 VCAM.
- All other boards like the CPU, MC3 and IO4 boards communicate POK-A errors to the system controller. These are OR-ed together: every failure is fatal. The actual values measured are not shown on the display, and the source of the POK-A error must be found using error LEDs on the various boards. Each DC-DC converter on every board also has measuring points.
Back to my system. The only source of -5V and -12V in the system is the IO4 VCAM. So I replaced the IO4 with a spare, which changed absolutely nothing. Damn. Fortunately, the VCAM is fitted with test points for all DC-DC converters, but they're not accessible from the outside when the card is in the system. So I wired up some extensions:
This confirmed that the voltages were all correct and in spec. The problem wasn't with the DC-DC converters, but with the environmental monitoring. This isn't very different from what the O3K / Fuel generation is known for, but unlike those you cannot disable monitoring on the Challenge/Onyx series (nor would you want to...).
Fortunately, I have a lot of spare parts, including two more system controllers. The first one didn't look promising:
This one probably came from my first Onyx RE2, which suffered an 48VDC short in the backplane at some point. I decided not to use that one. I had another one, but that only made things worse: now all voltages on the system controller display were out of spec, a little over 10% under voltage, and the system shut down again.
Fortunately, I have a known good Onyx IR, so I borrowed it's system controller and this time the Challenge was happy. I decided to focus on the system controller that gave a 10% under voltage on all rails. It has measurement points:
This is (left to right): 5V_AUX, GND and -14.5V. I measured the 5V_AUX at 5.5V -- some 10% too high. Aha! Now we're getting closer.
The regulator is an LT1076HVCT. The left one is the 5VDC converter, the right one is configured as an inverter and puts out around -14V. A typical application of the LT1076 family looks like this:
The regulator will adjust Vsw until it has 2.21V on Vfb. So the ratio between R1 and R2 defines the output voltage. You can find R1 and R2 on the back of the PCB in the form of two SMD resistors. They are 2.8 and 2.21k. I removed them and they measured good. I soldered a wire to the Vfb pin of the LT1076 and it showed 2.21V, yet the output of the regulator was not 5V, but some 10% higher, so I decided to replace it. Unfortunately, these parts are obsolete, but I found some NOS parts on eBay. And this did the trick! 5V_AUX is back at 4.98V, and the system controller correctly measures analogues voltages again!