Onyx Panic
#1
Onyx Panic
Having a problem with my Onyx RE², been dealing with memory issues and dropping a CPU intermittently, but it always booted into IRIX.  Lately I've gotten a panic once in a while but now it's every time I start the system up.  Can anyone decode this and know where the fault lies?  Also the system controller is displaying BD++


Attached Files Image(s)
       
(This post was last modified: 02-11-2022, 02:17 AM by kshuff.)
kshuff
Octane

Trade Count: (3)
Posts: 60
Threads: 26
Joined: Dec 2017
Location: Eastern PA
Find Reply
02-11-2022, 02:15 AM
#2
RE: Onyx Panic
Reading back through my logs when I had a major issue with my Onyx it appears your issue is similar if not exactly the same. I had to replace the EPC chip on the IO4 (and later the D Chip when parity issues came up). Your panic string is 99% the same as what I had when my EPC went. Have any spare IO4s handy? A swap might identify the issue quickly.

Onyx  Vault L  Crimson  Indigo  Personaliris  Octane2  1600SW   Indigo2 R10000/IMPACT  Indigo2  Indy  Challenge S  Tezro Rack
CB_HK
Crimson

Trade Count: (7)
Posts: 231
Threads: 43
Joined: May 2018
Location: Las Vegas, NV
Find Reply
02-11-2022, 06:23 AM
#3
RE: Onyx Panic
(02-11-2022, 06:23 AM)CB_HK Wrote:  Reading back through my logs when I had a major issue with my Onyx it appears your issue is similar if not exactly the same. I had to replace the EPC chip on the IO4 (and later the D Chip when parity issues came up). Your panic string is 99% the same as what I had when my EPC went. Have any spare IO4s handy? A swap might identify the issue quickly.
I'm very interested in hearing what you did in the above. Could you please highlight a picture of what this chip is?  Is this related to the row of ICs between each bank of memory modules?
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
02-11-2022, 06:29 AM
#4
RE: Onyx Panic
The EPC is on the second row from the rear of the IO4. You have to remove the VCAM to get to it easily.

[Image: 56857DCF-A906-4420-971A-BEB330FF5F69.jpeg?m=1644563545]

[Image: EF15805B-13DB-4281-A371-36E977F77276.jpeg?m=1644563552]

I used my PGA puller to remove a spare EPC I had from another IO4 and installed it on the faulty board. From reading the Onyx diagnostic roadmap the errors were pointing to an issue with the EPC not functioning correctly. Once I swapped the chip the system stopped panicking and was able to boot normally. About a week later I suffered a separate issue with the D chip (also on the IO4, in the first row from the rear) not routing signals properly and manifesting a parity error that would show up either in IRIX with interesting glitches or on boot with a panic. I pulled and replaced that chip as well and my Onyx has been solid ever since. I also traced that issue through the Diagnostic manual and the panic it gave on boot made it clear the D chip was having issues so that helped a good bit.

Onyx  Vault L  Crimson  Indigo  Personaliris  Octane2  1600SW   Indigo2 R10000/IMPACT  Indigo2  Indy  Challenge S  Tezro Rack
CB_HK
Crimson

Trade Count: (7)
Posts: 231
Threads: 43
Joined: May 2018
Location: Las Vegas, NV
Find Reply
02-11-2022, 07:18 AM
#5
RE: Onyx Panic
(02-11-2022, 02:15 AM)kshuff Wrote:  Also the system controller is displaying BD++

That's because you've got a disabled CPU.
  • B = Boot master
  • D = Disabled
  • + = more CPUs

My Challenge L says 'B+++++++++++' Smile
(This post was last modified: 02-11-2022, 10:44 AM by jan-jaap.)
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
02-11-2022, 10:19 AM
#6
RE: Onyx Panic
(02-11-2022, 07:18 AM)CB_HK Wrote:  The EPC is on the second row from the rear of the IO4. You have to remove the VCAM to get  on boot with a panic. I pulled and replaced that chip as well and my Onyx has been solid ever since. I also traced that issue through the Diagnostic manual and the panic it gave on boot made it clear the D chip was having issues so that helped a good bit.

No dice Colin. I have three known good IO4's and just got done replacing it, same exact panic. I did fix the memory issue, I now have 896Mb again.    
kshuff
Octane

Trade Count: (3)
Posts: 60
Threads: 26
Joined: Dec 2017
Location: Eastern PA
Find Reply
02-11-2022, 08:27 PM
#7
RE: Onyx Panic
Again, not that I know much but looking through this cool Onyx diagnostics guide: https://sgidepot.co.uk/onyx/sgi_onyx_dia...45-030.pdf

Section 2-8 claims this error myRequest addr ebus is a hardware state error and involves the A chip.  It doesn’t bring up much about the reasons access might not be gotten but the diagram show the ebus addr line solely goes to the A chip first…then proceeds to a parallel cc chip bus that gives SRAM & CPU access. 

So if you’re getting NOTHING, and CB_HK has already stated having a replace A chip before (pun not intended). I guess I’d start this this “A” chip as it’s the first access point for the whole card.  And you get nothing anyways.

Perhaps this is just a cracked solder joint and a reflow or rebake is all that’s needed (instead of a replacement chip)?  Or perhaps the tracks/connector for these ebus addr lines are shorted by a grounding cap or corroded or something?
(This post was last modified: 02-11-2022, 09:21 PM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
02-11-2022, 09:17 PM


Forum Jump:


Users browsing this thread: 2 Guest(s)