IRIX Network Forums
Onyx Panic - Printable Version

+- IRIX Network Forums (//forums.irixnet.org)
+-- Forum: SGI/MIPS (//forums.irixnet.org/forum-3.html)
+--- Forum: Hardware/Triage/Repair (//forums.irixnet.org/forum-11.html)
+--- Thread: Onyx Panic (/thread-3400.html)



Onyx Panic - kshuff - 02-11-2022

Having a problem with my Onyx RE², been dealing with memory issues and dropping a CPU intermittently, but it always booted into IRIX.  Lately I've gotten a panic once in a while but now it's every time I start the system up.  Can anyone decode this and know where the fault lies?  Also the system controller is displaying BD++


RE: Onyx Panic - CB_HK - 02-11-2022

Reading back through my logs when I had a major issue with my Onyx it appears your issue is similar if not exactly the same. I had to replace the EPC chip on the IO4 (and later the D Chip when parity issues came up). Your panic string is 99% the same as what I had when my EPC went. Have any spare IO4s handy? A swap might identify the issue quickly.


RE: Onyx Panic - weblacky - 02-11-2022

(02-11-2022, 06:23 AM)CB_HK Wrote:  Reading back through my logs when I had a major issue with my Onyx it appears your issue is similar if not exactly the same. I had to replace the EPC chip on the IO4 (and later the D Chip when parity issues came up). Your panic string is 99% the same as what I had when my EPC went. Have any spare IO4s handy? A swap might identify the issue quickly.
I'm very interested in hearing what you did in the above. Could you please highlight a picture of what this chip is?  Is this related to the row of ICs between each bank of memory modules?


RE: Onyx Panic - CB_HK - 02-11-2022

The EPC is on the second row from the rear of the IO4. You have to remove the VCAM to get to it easily.

[Image: 56857DCF-A906-4420-971A-BEB330FF5F69.jpeg?m=1644563545]

[Image: EF15805B-13DB-4281-A371-36E977F77276.jpeg?m=1644563552]

I used my PGA puller to remove a spare EPC I had from another IO4 and installed it on the faulty board. From reading the Onyx diagnostic roadmap the errors were pointing to an issue with the EPC not functioning correctly. Once I swapped the chip the system stopped panicking and was able to boot normally. About a week later I suffered a separate issue with the D chip (also on the IO4, in the first row from the rear) not routing signals properly and manifesting a parity error that would show up either in IRIX with interesting glitches or on boot with a panic. I pulled and replaced that chip as well and my Onyx has been solid ever since. I also traced that issue through the Diagnostic manual and the panic it gave on boot made it clear the D chip was having issues so that helped a good bit.


RE: Onyx Panic - jan-jaap - 02-11-2022

(02-11-2022, 02:15 AM)kshuff Wrote:  Also the system controller is displaying BD++

That's because you've got a disabled CPU.

My Challenge L says 'B+++++++++++' Smile


RE: Onyx Panic - kshuff - 02-11-2022

(02-11-2022, 07:18 AM)CB_HK Wrote:  The EPC is on the second row from the rear of the IO4. You have to remove the VCAM to get  on boot with a panic. I pulled and replaced that chip as well and my Onyx has been solid ever since. I also traced that issue through the Diagnostic manual and the panic it gave on boot made it clear the D chip was having issues so that helped a good bit.

No dice Colin. I have three known good IO4's and just got done replacing it, same exact panic. I did fix the memory issue, I now have 896Mb again.    


RE: Onyx Panic - weblacky - 02-11-2022

Again, not that I know much but looking through this cool Onyx diagnostics guide: https://sgidepot.co.uk/onyx/sgi_onyx_diagrm_108-7045-030.pdf

Section 2-8 claims this error myRequest addr ebus is a hardware state error and involves the A chip.  It doesn’t bring up much about the reasons access might not be gotten but the diagram show the ebus addr line solely goes to the A chip first…then proceeds to a parallel cc chip bus that gives SRAM & CPU access. 

So if you’re getting NOTHING, and CB_HK has already stated having a replace A chip before (pun not intended). I guess I’d start this this “A” chip as it’s the first access point for the whole card.  And you get nothing anyways.

Perhaps this is just a cracked solder joint and a reflow or rebake is all that’s needed (instead of a replacement chip)?  Or perhaps the tracks/connector for these ebus addr lines are shorted by a grounding cap or corroded or something?