Onyx: boot is incomplete, fault is no master
#1
Onyx: boot is incomplete, fault is no master
Hello there,

after about 20 years of not touching an SGI, fate has decided to once again spark my interest by giving me an Onyx. Impossible to resist. And of course, the troubles started right away...

I was able to boot the long neglected machine exactly once. It complained that there was no drive at the expected SCSI id and dropped me into the maintenance menu. So I looked at the hinv, found the ids of two existing drives and tried to see if any of them had an OS installed on it. Tried to figure out the boot command from memory. I guess you all know where this is headed.
After trying a few combinations of busses, ids, sashes and fxs, I finally managed to type something the box really did not like, ran into an illegal magic number exception and the system asked for the input of any key to reboot. Ever since that reboot, I have not seen any GUI, because the boot arbitration procedure does not complete anymore. Result is always "boot is incomplete, fault is no master". At least that is what the LCD tells me.

This, of course, is where I need your help. Can you tell me what's wrong here and how to fix it? Did my CPU board survive one startup and chose to fry itself when it saw my sad attempts of getting the system to boot?

Any help would be appreciated. I'll let the system cool down for a while and will try again, but I fear the worst.

   

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
(This post was last modified: 11-08-2021, 10:32 PM by capmilk.)
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
11-08-2021, 09:18 PM
#2
RE: Onyx: boot is incomplete, fault is no master
Hello and welcome!

Sorry to hear about your Onyx woes. If you haven’t already, download a copy of the Challenge/Onyx Diagnostic Roadmap. This is an invaluable reference for troubleshooting these machines.

Diagnostic Manual (Dropbox link)

I took a look at the manual and it suggests that none of the CPUs are responding during the boot process. This could mean a few things up to and including a bad CPU board. However, since it was working prior, it’s always worth pulling the board and reseating it. There’s a good chance that will help clear the issue.

Hopefully the solution is a simple one!

Onyx  Vault L  Crimson  Indigo  Personaliris  Octane2  1600SW   Indigo2 R10000/IMPACT  Indigo2  Indy  Challenge S  Tezro Rack
CB_HK
Crimson

Trade Count: (7)
Posts: 231
Threads: 43
Joined: May 2018
Location: Las Vegas, NV
Find Reply
11-09-2021, 04:46 PM
#3
RE: Onyx: boot is incomplete, fault is no master
Hello and thank you for the link, looks like an excellent source of information!

Sadly, a simply reseating of the CPU board did not solve the issue, but I used the opportunity to blow out some of the dust which had accumulated in the cooling fins and gave the board a long, hard stare. It is a dual 150 MHz board, so if I need to replace it, at least I can only trade up.

First thing I did check with the diagnostic manual are the status LEDs on the CPU board, they stop at the following values when the LCD shows boot is incomplete:

top  row of LEDs:  XXXX00  (15) Arbitrating for a bootmaster
btm row of LEDs:  XX000X  (35) Loading IO4 PROM

Feels just like opening the bonnet of my car, staring at the engine and mumbling "Maybe it fails to arbitrate for a bootmaster". Guess I'll study that diagnostic manual some more.

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
11-09-2021, 08:28 PM
#4
RE: Onyx: boot is incomplete, fault is no master
Hmm, I swear I replied to this yesterday, but I guess something went wrong Undecided 

Anyway, you've got the all-important Diagnostics Roadmap. A few random thoughts:

  1. Does it even detect the CPUs? Normally, you've got something like 'B+++' in the sysctrlr display, which means '4 CPUs, #0 is the Boot Master'.
  2. Did it by chance disable both CPUs? I'm pretty sure you can re-enable them using the system controller.
  3. The system controller can be used to override or debug the arbitration process. It's in the Diagnostics Roadmap.
  4. You say you dusted the CPUs, but I'd pull the L2 cache SIMM modules near them, clean the contact edges with pure alcohol and re-seat them. A failed cache test is bound to disable a CPU.
  5. If your Onyx isn't one of the very first, you've got a DB9 serial port labeled 'System Controller' or SSE or similar in the bottom right corner of the front panel, under the drive bays. It puts out a ton of debug information during the boot process, the summary of which is displayed on the sysctrlr LCD. The serial port uses the old SGI 4D pinout, not the 'standard' PC RS232 pinout. It's also male <-> female inverted. You'll have to wire up a custom NULL modem cable and attach your favorite serial terminal program, 9600-N-8-1. You can the same cable afterwards to access the serial console of the system.
  6. The system ID is stored in two locations in the system: a Dallas chip on the system controller module (on the back side of the backplane), and on a different Dallas chip on the primary IO4. The builtin batteries of these things must have run out by now. I'm not quite sure what happens if the system cannot determine a system serial number and it's entire stored configuration is randomized, but if it's fatal it will probably complain about it on the SSE port.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
11-10-2021, 09:40 AM
#5
RE: Onyx: boot is incomplete, fault is no master
Thanks to you, too. Those were awesome tips which helped me quite a bit. Here's where I am at:

The machine actually first shows B+ while testing miscellaneous bits and pieces and then shows SCACHE FAILED ever so briefly, before changing to +B and starting the same procedure from scratch. Same error after that. I only just found it on the video I recorded of the boot sequence when I went throgh it in slow-motion to pinpoint when this change might happen. Thanks for pointing my nose on it!

So I will continue with cleaning the L2 RAM contacts as soon as I figure out how to get them out. Since I don't want to yank 30 year old plastic parts too forcefully, I have not yet understood what I need to do to release them from their sockets...

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
11-10-2021, 10:45 PM
#6
RE: Onyx: boot is incomplete, fault is no master
I happened to have a CPU board handy, so I had a quick look.

You have to pull the tabs on the SIMM sockets up. It can take considerable force. The same type of sockets are used on the MC3 memory board, but bigger SIMMs. I've broken off tabs in the past on these, but you have to do something if the system starts to disable banks of memory...

[Image: IMG_7219_sm.jpg]
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
11-11-2021, 08:59 AM
#7
RE: Onyx: boot is incomplete, fault is no master
@capmilk: I like your avatar  Cool

and: Welcome to the gang!

SGI - the legend will never die!!

Indy Indigo Crimson Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT O2 O2 Octane Octane2 Octane2 Tezro
(This post was last modified: 11-11-2021, 09:11 AM by Geoman.)
Geoman
Crimson to Tezro

Trade Count: (0)
Posts: 162
Threads: 13
Joined: May 2018
Location: Germany
Find Reply
11-11-2021, 09:11 AM
#8
RE: Onyx: boot is incomplete, fault is no master
Thanks once again for your in-depth tips and kind words.

Thankfully, I managed to get the cache RAMs out without breaking any of the levers. Cleaning the contacts resulted in quite a bit of grime and very shiny edges, but sadly nothing changed in the boot procedure. Both caches still fail their tests.

I'll see if I can fabricate a serial cable to check if the console contains more useful information, but I am afraid that I might be in over my head here.

[Image: onyx.png] [Image: indigo.png] [Image: o2.png] [Image: indy.png]
capmilk
O2

Trade Count: (0)
Posts: 10
Threads: 1
Joined: Nov 2021
Location: Germany
Find Reply
11-11-2021, 08:32 PM
#9
RE: Onyx: boot is incomplete, fault is no master
Don’t fear if you’re running into roadblocks. If there’s anything I’ve learned from being part of this community it’s that we are all very driven to help solve these issues. I won’t speak for anyone else, but it really makes me happy to see a system rescued and running.

Hang in there! We’ll try to help you get it up and running yet!

Onyx  Vault L  Crimson  Indigo  Personaliris  Octane2  1600SW   Indigo2 R10000/IMPACT  Indigo2  Indy  Challenge S  Tezro Rack
CB_HK
Crimson

Trade Count: (7)
Posts: 231
Threads: 43
Joined: May 2018
Location: Las Vegas, NV
Find Reply
11-12-2021, 03:20 AM
#10
RE: Onyx: boot is incomplete, fault is no master
Curious
I'm looking at Ian's site at this: http://www.sgidepot.co.uk/chalonyxdiag/

On CH2, it talks about power and I'm a power guy and love to blame power so...power. If there isn't CPU arbitration, maybe there's bad power TO the CPU? It claims there are red LEDs on the top and bottom side of the CPU board (Assuming I'm looking at what you have). Any RED LEDs on the CPU board lit? Have you checked power from the PSU terminal while it's on with a multimeter to see if power is actually good?

This chapter talks about troubleshooting...they get really specific. http://www.sgidepot.co.uk/chalonyxdiag/diagprocs.html

The one thing that caught my eye was that ONCE a CPU is disabled it will STAY disabled through power-cycles. Does this Onyx model only have one CPU...what happens when it's disabled...this symptom? I'm guessing.

If you have a disabled CPU you have to perform a special procedure to re-enable the CPU. Read Section 6.3.3.4 for the instructions from link above.

Otherwise I guess console boot output is a must then...
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-12-2021, 06:35 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)