Onyx2 Deskside Midplane Capacitor/Resistor Identification
#11
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
I don't mean to be a pest about it, but are we certain that M8N2B (in my first reply after Jan-Jaap and Weblacky posted) isn't supposed to be populated? Unlike the other unpopulated spots, I'm getting continuity on my multimeter between the pads. The positive side appears to be connected to ground as of now, unlike the others as well. Could it be a missing coupling cap like Weblacky mentioned as a possibility?

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
(This post was last modified: 09-04-2019, 08:40 PM by kaigan.)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-04-2019, 08:27 PM
#12
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
(09-04-2019, 08:27 PM)kaigan Wrote:  I don't mean to be a pest about it, but are we certain that M8N2B (in my first reply after Jan-Jaap and Weblacky posted) isn't supposed to be populated? Unlike the other unpopulated spots, I'm getting continuity on my multimeter between the pads. The positive side appears to be connected to ground as of now, unlike the others as well. Could it be a missing coupling cap like Weblacky mentioned as a possibility?

Excuses, I overlooked that one. Anyway, this one was *not* empty:


Attached Files Image(s)
   
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
09-04-2019, 08:58 PM
#13
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
(09-04-2019, 08:58 PM)jan-jaap Wrote:  Excuses, I overlooked that one. Anyway, this one was *not* empty:

No worries at all! Again, I really appreciate your assistance. As I mentioned, these pads seemed a little different (physically and electrically), and it must have just been a clean break from the board. I didn't see anything down in the fan assembly, but maybe it's still hiding in the chassis somewhere.

I'll get it replaced and then run some diagnostics to see if we're still running into errors.

Thanks again!

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-04-2019, 09:45 PM
#14
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
Good spotting! If one side (pad) is definitely ground (very very low, or zero, ohms reading to ground), then it's a missing decoupling tantalum capacitor (or similar mechanism designed to take out voltage ramp-up and ringing out of a signal). It could could be as simple as that, some advice, buy a couple...shipping is often more than the part...so make it worth your time.

While I have no idea what part of the circuit that is, I have heard that ringing or "dirty" signals from lack of decoupling capacitors can definitely prevent ICs from starting or getting good signal and causing issues. So very much worth trying.

So well spotted, I'm sure you'll keep us up to date.
(This post was last modified: 09-04-2019, 10:06 PM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
09-04-2019, 10:05 PM
#15
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
I found a capacitor locally that fit the bill. However, it doesn't seem to have fixed my issues.

I left the Onyx2 running in heavy diagnostics mode with verbosity for three hours. One node board installed (believed working, from Raion), with minimal RAM. Everything else, save the power components, null router, and MSC/CD-ROM was removed. This was the output:

Code:
2A 000: Starting PROM Boot process
2A 000: Master link 9. Clearing other arb registers
2A 000: Check_master: link 9 is master
2A 000: Running xbow_sanity diag (Xbow address = 0x9200000000000000)
2A 000: Check_master: link 9 is master
2A 000: Running bridge_sanity diag (Bridge base = 0x920000000f000000)
2A 000:
2A 000: *** General Exception on node 0
2A 000: *** EPC: 0xc00000001fc05be4 (0xc00000001fc05be4)
2A 000: *** Press ENTER to continue.
2B 000:
2B 000: *** General Exception on node 0
2B 000: *** EPC: 0xc00000001fc586a0 (0xc00000001fc586a0)
2B 000: *** Press ENTER to continue.

Not a lot for three hours, it seems. The MSC display read "P 53M 1"

It looks like something is halting the system fairly early on in the boot process. I do get different results with normal diagnostics and two node boards, including early fails to DEX mode, PROM boots that will continue until cache checks and then halt, and a number of other errors. I'll get logs up for some of those other scenarios soon.

I have a hard time believing that I have four faulty node boards, especially since Raion was pretty confident that the two I purchased from him were last known working as part of an Origin 2000. Running a "why" command in DEX when it fails there is always indicating an issue with the bus. (Again, I'll get logs for analysis soon. The system doesn't seem to want to get to DEX mode right now.)

I'm thinking that there is likely some other issue with the midplane, possibly related to one of the Xbows. If I can find another Onyx2 deskside midplane for sale at a reasonable price, perhaps it would be best to pick it up and set this one aside for later repair attempts. That said, until I can find one, I'm still good to pursue repairing this one now, especially with such awesome people around to assist. Smile

Anyway, more logs and scenarios coming as I have more time to test. Thank you all again!

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
(This post was last modified: 09-06-2019, 01:16 AM by kaigan.)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-06-2019, 12:51 AM
#16
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
Here's a log example from the system with a second node board installed. It dropped into DEX mode, so I ran a "why" command, which is indicating a data bus error.

Code:
MSC VER 3.0
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000: *** early return from xbow_init: master 0x0, link 0xa
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.156  built 11:27:56 AM Nov 18, 2003
1A 000: Local master CPU A revision: e35
2A 000: Check_master: link 9 is master
2A 000: Check_master: link 9 is master
2A 000:
2A 000: *** General Exception on node 0
2A 000: *** EPC: 0xc00000001fc05be4 (0xc00000001fc05be4)
2A 000: *** Press ENTER to continue.
2A 000: POD MSC Dex> why
2A 000:  EPC    : 0xc00000001fc05be4 (0xc00000001fc05be4)
2A 000:  ERREPC : 0xffffffffbfc00ee0 (0xc00000001fc00ee0)
2A 000:  CACERR : 0x0000000000000000
2A 000:  Status : 0x0000000024407c80
2A 000:  BadVA  : 0x0000000000000000 (0x0)
2A 000:  RA     : 0xc00000001fc1ad0c (0xc00000001fc1ad0c)
2A 000:  SP     : 0xa800000000103390
2A 000:  A0     : 0x920000000f0000b4
2A 000:  Cause  : 0x000000000000001c (INT:-------- <Data Bus Err>)
2A 000:  BusAdr : 0x920000000f0000b4
2A 000:  Reason : 242 (Unexpected General Exception.)
2A 000:  POD mode was called from: 0xc00000001fc02808 (0xc00000001fc02808)
2A 000: POD MSC Dex>

More to follow when I can provide it. As always, I appreciate any thoughts folks can provide.

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-06-2019, 01:26 PM
#17
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
Here's some log output from an Onyx2 with an IO6G problem. This wasn't with the heaviest of diag settings Wink , but anyway:

Code:
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.156  built 11:27:56 AM Nov 18, 2003
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
1A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000:  alive!
1A 000: *** Diag level set to None (2)
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000:  alive!
2A 000:
2A 000:
2A 000: IP27 PROM SGI Version 6.156  built 11:27:56 AM Nov 18, 2003
2A 000: *** Warning: MSC debug (dbg) switches are non-zero
2A 000: *** Diag level set to None (2)
1A 000: Testing/Initializing memory ...............             DONE
1B 000: Testing/Initializing memory ...............             DONE
2A 000: Testing/Initializing memory ...............             DONE
2B 000: Testing/Initializing memory ...............             DONE
1A 000: Copying PROM code to memory ...............             DONE
2A 000: Copying PROM code to memory ...............             DONE
1A 000: Discovering local IO ......................             WARNING: xbow_base: 0x920000000
1A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: DONE
2A 000: Discovering local IO ......................             WARNING: xbow_base: 0x920000000
1A 000: Discovering NUMAlink connectivity .........             DONE
2A 000: 0000000 link: 15 Widget present, but link not alive!
1A 000: Found 3 objects (2 hubs, 1 routers) in 66354 usec
2A 000: WARNING: xbow_base: 0x9200000000000000 link: 15 Widget present, but link not
2A 000:  alive!
2A 000: DONE
2A 000: Discovering NUMAlink connectivity .........             DONE
2A 000: Found 3 objects (2 hubs, 1 routers) in 1781 usec
2A 000: Waiting for peers to complete discovery....             DONE
2A 000: Recognized 390 MHz midplane
2A 000: *** Global master /hw/module/1/slot/n2 does not have a console
2A 000: Global master is /hw/module/1/slot/n2
1A 000: Waiting for peers to complete discovery....             DONE
1A 000: Recognized 390 MHz midplane
1A 000: *** Global master /hw/module/1/slot/n2 does not have a console
1A 000: Global master is /hw/module/1/slot/n2
2A 0001A 0Testing/Initializing all memory ...........           DONE
2A 001:Testing/Initializing all memory ...........              DONE
1A 000:Checking partitioning information .........              DONE
1A 000: *** Partition master /hw/module/1/slot/n2 does not have a console
2A 001:Checking partitioning information .........              DONE
2A 001: *** Partition master /hw/module/1/slot/n2 does not have a console
1A 000: nic_read_mfg: invalid crc16 reading redirection map page 3
1B 000: Local slave entering slave loop
1A 000:Local master entering slave loop
2B 001: Local slave entering slave loop
2A 001:*** No console found. Searching for console...
2A 001: *** No console found. You need a console to proceed.
2A 001: *** To recover: Add a BASEIO board and reset.
2A 001:
2A 001: *** Entering POD mode on node 1
2A 001: POD MSC Cac>

Compare this to your results, and the first thing I notice is this:

Code:
MSC VER 3.0
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
1A 000: *** early return from xbow_init: master 0x0, link 0xa

Node 1 CPU A is having an Xbow init problem. It could be with the node, or the backplane, or the interconnect.

If it is with the node, it should move with the node if you swap them (since you have two nodes). I'm going to assume you tried this.
If it is with the interconnect or the backplane, the error will be unchanged.

FWIW: I've seen some pretty dirty Onyx2's. The amount airflow in the system will blow the bigger dust bunnies out, but the smaller dust particles can still leave a film of powder that looks almost like laser printer toner. Another problem are the foam plugs on the end of the baffles installed in empty slots. These disintegrate with age and leave crap on the connector pads (where the compression connectors mate) of the backplane. You have to be careful with that, because if you upgrade a system with something in a slot that used to be empty you may end up with that crap in the compression connector of the new board and then you're screwed. This happened to me when I installed a PCI cage in an Octane.

Unlike compression connectors it's possible to clean the contact pads on the backplane with pure alcohol and a lint-free cloth.

You also need to reflect on how the system was when you got it, and what this might mean. Did it have nodeboards in it? If not, could it be the system broke at some point and was used as a parts donor for another system?

Oh, you're aware that the backplane needs to be re-jumpered if you take out 180MHz nodes and replace them with anything other than 180MHz nodes? These two jumpers are set correctly, right? I assume this would be more of a problem if you try to run a 180MHz node on a backplane jumpered for the higher link speed of the newer (faster) nodes, not the other way around.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
09-06-2019, 02:10 PM
#18
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
I actually didn't catch the Xbow messgage in that log! That said, that's actually the first time I've seen it across several resets.

I have cleaned the contact pads with contact cleaner. The midplane did have a place where one of the foam pads had disintegrated and I cleaned that up as well. The only card in there at the moment is the IO6G, and I've cleaned its contact pad, too.

The system did have two node boards installed in it when I purchased it, along with the full graphics set. From what I know, the system was still in fairly recent use until it was obtained by the recycler. Of course, since then, it was shipped, likely roughly handled, and set on its side for a good length of time, any of which could have caused issues.

The midplane is jumpered correctly, yes. The boards that came with it were dual 400 MHz, as are the ones I have from Raion.

Since it's still only minimally put together, I can easily remove the node boards and the midplane again. I'll try giving the midplane a thorough cleaning with an air duster can and some electronics cleaner. Maybe I'll get lucky. Smile

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-06-2019, 08:01 PM
#19
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
I'm personally pretty confident at this point that there's an issue with the midplane itself. I've removed it, cleaned it, reinstalled everything and the errors persist. Errors relating to the Xbow are showing up more frequently as well. I'm going to go ahead and put up a wanted post here on the forums and hopefully someone will eventually have one available.

If anyone has any other ideas, I'll very happily try them, but it seems likely enough to me that I have an Xbow/bus issue that I'm willing to start looking for replacement hardware.

Thanks again for the help, everyone!

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
09-12-2019, 06:47 PM
#20
RE: Onyx2 Deskside Midplane Capacitor/Resistor Identification
(09-12-2019, 06:47 PM)kaigan Wrote:  I'm personally pretty confident at this point that there's an issue with the midplane itself. I've removed it, cleaned it, reinstalled everything and the errors persist. Errors relating to the Xbow are showing up more frequently as well. I'm going to go ahead and put up a wanted post here on the forums and hopefully someone will eventually have one available.

If anyone has any other ideas, I'll very happily try them, but it seems likely enough to me that I have an Xbow/bus issue that I'm willing to start looking for replacement hardware.

Thanks again for the help, everyone!

Try Mashek Systems' I'm sure they'll have one for you.
Irinikus
Hardware Connoisseur

Trade Count: (0)
Posts: 3,475
Threads: 319
Joined: Dec 2017
Location: South Africa
Website Find Reply
09-12-2019, 07:13 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)