Onyx: boot is incomplete, fault is no master
#21
RE: Onyx: boot is incomplete, fault is no master
(11-18-2021, 08:14 AM)jan-jaap Wrote:  Next up: my Challenge L. It throws a -12VDC over-voltage and refuses to power on. The only place where -12VDC exists in these is the IO4 VCAM. I replaced the IO4 but the fault remained. Meh. I think I'll replace the system controller next (this means rear side disassembly...), maybe the monitoring circuit is bad. Otherwise it would mean I have not one but two broken VCAMs in my hands, and I'd have to attach some test wires to measure the actual voltages that the system complains about.
Replaced system controller, no change. I guess both VCAMs really are bad. The inverters on the VCAM make -12 and -5 VDC from +12 coming from a 512 power brick on the backside of the backplane and these things are (in my limited experience) by far the most common source of power faults. Fortunately the circuit isn't overly complicated so I'll stop thread-jacking now and return when I have a write up of the repair.

(11-18-2021, 07:53 PM)indigofan Wrote:  I have two RM8 boards, don't know if that would help you out... LMK.

Thanks, but RM8 go in an Onyx2, I'm looking for parts for the original Onyx IR
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
11-19-2021, 09:14 AM
#22
RE: Onyx: boot is incomplete, fault is no master
Quote:This is life with an Onyx1: a simple memory fault escalates into 2 or 3 evenings of disassembly, cleaning and more damage, and only a good stock of spares saved the day.

After re-reading the manual entry on RAM interleaving a few times, Sherlock me finally understood that there is more than one H SIMM. So I followed jan-jaap's example up there, enjoyed multiple evenings of disassembling the machine and cleaned a lot of SIMMs and SIMM slots on both the CPU and the MC3 board . And the good new is: Eventually I did get rid of the memory error message. Banzai!

However, the SCACHE FAILED error still happens for both CPUs, and the machine still stops the boot process. But at least I am down to one fault. Sadly, I don't have a good stock of spares, so trying a different CPU board is out of the question. Back to reading the manuals for now.

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
(This post was last modified: 11-19-2021, 10:52 PM by capmilk.)
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
11-19-2021, 10:51 PM
#23
RE: Onyx: boot is incomplete, fault is no master
Not that I know anything, but a bad Fuel PIMM I am looking has cache errors in spades when CPU module is OVER VOLTAGE. I think there may be a relationship (then again, maybe not). If I were you, I'd do a hard look at any test points in the manual to test live voltages on each CPU card to verify operation. Possibly the SCACHE does not run due to lack of sufficient voltage or over voltage.

I assume the CPU card(s) have their own VRMs and take bus voltage and make what they need. I'd test both the input voltage to the CPU card (if the manual tell you where the test points are located) and the VRM output voltage made by the CPU cards. Since you have more than one CPU board installed (I assume from your plural grammar), I'd assume the issue may not internal VRMs but bus-provided power. If you have multiple CPUs on a single board, then I'd worry about internal VRM voltage as well.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-20-2021, 03:40 AM
#24
RE: Onyx: boot is incomplete, fault is no master
Just a quick update: I let the machine sit in its corner for the remainder of last year, since I was not very confident with diving into this whole electricity stuff.
Now it's a new year, and while I still don't have toooo much confidence here, I will get back to diagnosing and hopefully fixing the Onyx before its 30th birthday next year.

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
01-02-2022, 07:21 PM
#25
RE: Onyx: boot is incomplete, fault is no master
(11-20-2021, 03:40 AM)weblacky Wrote:  Not that I know anything, but a bad Fuel PIMM I am looking has cache errors in spades when CPU module is OVER VOLTAGE.  I think there may be a relationship (then again, maybe not).  If I were you, I'd do a hard look at any test points in the manual to test live voltages on each CPU card to verify operation.  Possibly the SCACHE does not run due to lack of sufficient voltage or over voltage.
I wish I had something insightful to say about your case, really.

IIRC, for the Fuel (and MIPS/SGI in general), there's the CPU core speed, the SysAD speed, and the cache speed. The SysAD speed is more or less fixed, the core speed is a multiple of it and the cache speed is the core speed divided by 2.5 or 3 usually.

That's all good and well, but I have the nagging feeling that in the case the dividers are stored in NVRAM on the mainboard. There's got to be a good reason for that, because otherwise it's insane -- with SysAD more or less fixed, and that configuration on the CPU module, basically any CPU could run on any mainboard. We know that's not true :(

OTOH, if the dividers *are* stored in the mainboard and you put a 600MHz module on a mainboard configured for 700MHz, you're hot-clocking your cache by ~17% and cache doesn't like that. Not by more than a few percent for sure.

The same logic is used on bigger iron but there it's usually less of a problem -- there's usually another node to be the boot master and from there you can salvage the mis-configured node, even if it takes half an hour to POST (this can happen on Onyx2, really). Origin 200 can be  real pain: if you screw up one node you need to find a second, NUMA-link them and use the working node to salvage then mis-configured one. And then they shrunk the Origin to  a single CPU workstation and that's where you're stuck.
 
I'm pretty sure there must have been L2 software inside SGI to deal with this -- otherwise why bother with that port at all?? But so far nobody figured it out and your best option may be to buy/borrow a 700MHz CPU, if only to downclock your mainboard.

(11-20-2021, 03:40 AM)weblacky Wrote:  I assume the CPU card(s) have their own VRMs and take bus voltage and make what they need.  I'd test both the input voltage to the CPU card (if the manual tell you where the test points are located) and the VRM output voltage made by the CPU cards.  Since you have more than one CPU board installed (I assume from your plural grammar), I'd assume the issue may not internal VRMs but bus-provided power. If you have multiple CPUs on a single board, then I'd worry about internal VRM voltage as well.

In the Challenge/Onyx, there's one (deskside) or multiple (rack) OLS's. Each OLS puts out 48VDC towards the backplane. Most boards (CPU, memory) have their own converters from 48VDC to whatever they need. The IO4 has converters, but depends on a 512 brick on the back of the backplane as well. Power for graphics is supplied by 505 or 303 bricks on the back of the backplane.

I wrote up my experience diagnosing this system HERE. This assumes you have already digested the chapter about the power system in the CHALLENGE™/Onyx™ Diagnostic Road Map.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
01-02-2022, 08:02 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)