Here's some log output from an Onyx2 with an IO6G problem. This wasn't with the heaviest of diag settings

, but anyway:
Code:
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
1A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000: alive!
1A 000: *** Diag level set to None (2)
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000: alive!
2A 000:
2A 000:
2A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000: *** Diag level set to None (2)
1A 000: Testing/Initializing memory ............... DONE
1B 000: Testing/Initializing memory ............... DONE
2A 000: Testing/Initializing memory ............... DONE
2B 000: Testing/Initializing memory ............... DONE
1A 000: Copying PROM code to memory ............... DONE
2A 000: Copying PROM code to memory ............... DONE
1A 000: Discovering local IO ...................... WARNING: xbow_base: 0x920000000
1A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: DONE
2A 000: Discovering local IO ...................... WARNING: xbow_base: 0x920000000
1A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: Found 3 objects (2 hubs, 1 routers) in 66354 usec
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000: alive!
2A 000: DONE
2A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: Found 3 objects (2 hubs, 1 routers) in 1781 usec
2A 000: Waiting for peers to complete discovery.... DONE
2A 000: Recognized 390 MHz midplane
2A 000: *** Global master /hw/module/1/slot/n2 does not have a console
2A 000: Global master is /hw/module/1/slot/n2
1A 000: Waiting for peers to complete discovery.... DONE
1A 000: Recognized 390 MHz midplane
1A 000: *** Global master /hw/module/1/slot/n2 does not have a console
1A 000: Global master is /hw/module/1/slot/n2
2A 0001A 0Testing/Initializing all memory ........... DONE
2A 001:Testing/Initializing all memory ........... DONE
1A 000:Checking partitioning information ......... DONE
1A 000: *** Partition master /hw/module/1/slot/n2 does not have a console
2A 001:Checking partitioning information ......... DONE
2A 001: *** Partition master /hw/module/1/slot/n2 does not have a console
1A 000: nic_read_mfg: invalid crc16 reading redirection map page 3
1B 000: Local slave entering slave loop
1A 000:Local master entering slave loop
2B 001: Local slave entering slave loop
2A 001:*** No console found. Searching for console...
2A 001: *** No console found. You need a console to proceed.
2A 001: *** To recover: Add a BASEIO board and reset.
2A 001:
2A 001: *** Entering POD mode on node 1
2A 001: POD MSC Cac>
Compare this to your results, and the first thing I notice is this:
Code:
MSC VER 3.0
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000: *** early return from xbow_init: master 0x0, link 0xa
Node 1 CPU A is having an Xbow init problem. It could be with the node, or the backplane, or the interconnect.
If it is with the node, it should move with the node if you swap them (since you have two nodes). I'm going to assume you tried this.
If it is with the interconnect or the backplane, the error will be unchanged.
FWIW: I've seen some pretty dirty Onyx2's. The amount airflow in the system will blow the bigger dust bunnies out, but the smaller dust particles can still leave a film of powder that looks almost like laser printer toner. Another problem are the foam plugs on the end of the baffles installed in empty slots. These disintegrate with age and leave crap on the connector pads (where the compression connectors mate) of the backplane. You have to be careful with that, because if you upgrade a system with something in a slot that used to be empty you may end up with that crap in the compression connector of the new board and then you're screwed. This happened to me when I installed a PCI cage in an Octane.
Unlike compression connectors it's possible to clean the contact pads on the backplane with pure alcohol and a lint-free cloth.
You also need to reflect on how the system was when you got it, and what this might mean. Did it have nodeboards in it? If not, could it be the system broke at some point and was used as a parts donor for another system?
Oh, you're aware that the backplane needs to be re-jumpered if you take out 180MHz nodes and replace them with anything other than 180MHz nodes? These two jumpers are set correctly, right? I assume this would be more of a problem if you try to run a 180MHz node on a backplane jumpered for the higher link speed of the newer (faster) nodes, not the other way around.