Follow me down the rabbit hole of Fuel PIMM repair.
#21
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Yo,
Okay after more looking I’m looking for a member that has an understanding of the R14000 and R16000 MIPS CPU connection to SRAM (Secondary Cache Memory).


Going through the errors I got there is a tight pattern. I’m trying to find real documentation but only found a few for other MIPS processors.  They claim there MAY be some kind of external multiplexer to interact with the external cache (it’s not a direct track from CPU to SRAM).  Can anyone give more info on this?  Because this feels like a solder joint/bad connection or a damaged IC that performs the interaction between the CPU register signalling and the actual addressing of the external CACHE.  If so, I might find a bad IC or reflow something to see if the communication lines clear up…doesn’t seem that random…but I have no info on how the CPU couples to secondary SRAM?  Anyone know?

Patterns from log above:

Code:
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002008 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000009 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000200 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000010 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000200 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000020 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000020 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000200 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000080 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000020 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000200 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000200 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000020 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000020 (Way 0)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000000 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000010 (Way 0)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000001000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000002000 (Way 0)
   Address      : 0x0000000000000800 (Way 0)
   Address      : 0x0000000000000400 (Way 0)
   Address      : 0x0000000000000100 (Way 0)
   Address      : 0x0000000000000001 (Way 1)
   Address      : 0x0000000000000001 (Way 1)


   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 10  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 20  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 20  0000002000000000 0000000000000000  000
   Syndrome     : 10  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 10  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 20  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 10  0000002000000000 0000000000000000  000
   Syndrome     : 50  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 10  0000000040000000 0000000000000000  000
   Syndrome     : 00  4000000040000000 0000000000000000  000
   Syndrome     : 00  0000008000000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  000008a000000000 0000000000000000  000
   Syndrome     : 20  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 20  0000082000000000 0000000000000000  000
   Syndrome     : 10  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 10  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 10  0000002000000000 0000000000000000  000
   Syndrome     : 70  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000082000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 00  0000002000000000 0000000000000000  000
   Syndrome     : 30  0000002000000000 0000000000000000  000


SRAM on this PIMM is a 0436A4CBLBB
Found the datasheet at: https://www.datasheetarchive.com/pdf/dow...436A4CBLBB

This PIMM has a huge amount of test points so I guess I can try to locate the SRAM pin and backtrace into something?
(This post was last modified: 12-16-2021, 09:44 AM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
12-16-2021, 09:28 AM
#22
RE: Follow me down the rabbit hole of Fuel PIMM repair.
(12-16-2021, 09:28 AM)weblacky Wrote:  Going through the errors I got there is a tight pattern. I’m trying to find real documentation but only found a few for other MIPS processors.  They claim there MAY be some kind of external multiplexer to interact with the external cache (it’s not a direct track from CPU to SRAM).  Can anyone give more info on this?  Because this feels like a solder joint/bad connection or a damaged IC that performs the interaction between the CPU register signalling and the actual addressing of the external CACHE.  If so, I might find a bad IC or reflow something to see if the communication lines clear up…doesn’t seem that random…but I have no info on how the CPU couples to secondary SRAM?  Anyone know?

as I haven't seen R14k/R16k user manuals I'm not 100% sure the scache interface is the same as on R10k/R12k, but as a starting point...

Look in the R10k user manual

https://usermanual.wiki/Document/0072490001.2505313346

there is a picture how the scache is connected on page 38 and signal description on page 69/70

Challenge L Indy Indigo2 Indigo2 R10000/IMPACT O2 Octane Origin2000 Deskside Fuel Challenge S Origin 200 Origin 2000 Rack Origin 2000 Rack Origin 3200 Origin 350
fleedwood
O2

Trade Count: (0)
Posts: 24
Threads: 1
Joined: Dec 2020
Location: Germany
Find Reply
12-16-2021, 09:44 AM
#23
RE: Follow me down the rabbit hole of Fuel PIMM repair.
A unique sort also claims it only affects two of the SRAM modules (same stencil labels on the PIMM...I love that about SGI...they say the real location on the PCB!):

Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
Asterix R14K CPU C3D1 [Pin AA27] SRAM C8E6 [Pin H1]
Asterix R14K CPU C3D1 [Pin AD30] SRAM C8E6 [Pin H3]
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
Asterix R14K CPU C3D1 [Pin AK30] SRAM C8E6 [Pin D1]

My hunch is still saying there has to be something about these connections and NOT the SRAM modules themselves.

fleedwood, I looked at the pages you recommended but it didn't really give me the connection I was looking for. Perhaps they are directly connected, but I need more connection-style info, I do plan to do a microscope inspection of the bottom for scratches and whatnot. It'd be nice to find several ICs that control banks or groups of the SRAM (somehow) to find a location to start looking!
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
12-16-2021, 09:56 AM
#24
RE: Follow me down the rabbit hole of Fuel PIMM repair.
I've taken a quick look under the PIMM and there appear to be these small array SMDs (maybe higher wattage resistor arrays) around the chips between the Cache chips and the CPU. It's at least a starting point if I can find the pins referred to on the cache chips, then back tracing them. It's possible there is a cracked solder joint or a bad (out of spec) protection resistor in the array or something hindering communication.

My gut still says something is interfering, versus something inside the cache or the CPU is broken. The pattern of disruption is just so small, I'd believe it falls on a single line or maybe two. I didn't immediately see anything that looking like a multiplexer or anything positioned between them for communication.

I'll update when I have more.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
12-17-2021, 03:14 AM
#25
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Well...I tried some stuff but I'm afraid I'm at a stand still for now.

I assumed SRAM labels were the matching ones on the PCB stencils, and I assumed the Pin indications used the address-matrix on the underside of the PCB (BGA SRAM test pads):

   

The only issue I have is the PROCESSOR SIDE..I cannot figure out the address of pins on the CPU-side (nor have I found ANY online docs to help me).  I was hoping that the CPU used some kind of bus multiplexer to address the SRAM and have some components sitting between the CPU and SRAM BGA chips.  I cannot readily find any!!!!  Curve tracking measurements show the same zener diode signature on the SRAM to GROUND defined Pins (so the GBA chip pin interface appears to have its diode protection in place).

But I cannot make heads or tails on the CPU side.  If I KNEW my fluke 189 Continuity test voltage than I'd think about trying to FIND the corresponding pins on the test points on the underside..but I'd need to be like 1V or less to do that for my comfort zone.

I was hoping for a protection diode and addressing plexing system that might have an issue with these specific lines:


      Asterix R14K  CPU  C3D1 [Pin  AK1]  SRAM C8B7  [Pin D1]
      Asterix R14K  CPU  C3D1 [Pin AA27]  SRAM C8E6  [Pin H1]
      Asterix R14K  CPU  C3D1 [Pin AD30]  SRAM C8E6  [Pin H3]
      Asterix R14K  CPU  C3D1 [Pin AG33]  SRAM C8E6  [Pin H8]
      Asterix R14K  CPU  C3D1 [Pin AK30]  SRAM C8E6  [Pin D1]

But nothing so far...I also ASSUME this isn't a config issue because while I AM currently configured for an R16000 700Mhz CPU, this error correctly IDs the R14000 600 Mhz CPU I'm repairing...so I assume it knows it's own SRAM config.

I feel like I'm very close...sad part it that since I don't KNOW what the PIMM 5V rail was for, the fact that it was at 6.25V for a time may have actually damaged something I cannot replace.

Any docs, hints, or external help would be appreciated.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
01-01-2022, 12:19 AM
#26
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Hi All,
So some UNCONFIRMED movement on this topic, but it needs to be posted. I got a private message from "schimmi" that this issue MAY BE the result of the incorrect board/CPU configuration and NOT damaged SRAM from the overvoltage.

Schimmi claims to have seen errors like this come up when trying PIMMs where the mainboards are set for HIGHER CPUs. There is also the obvious fact that there might be a pinout difference between an 600Mhz versus the 700Mhz CPUs for SRAM interfacing, given they aren't the same family (different R revisions). But I'm unsure of that.

So, I'm not going to flash my working Fuel board to test this repaired CPU, but I MAY be given an opportunity to repair future Fuel boards that are already configured for a 600Mhz and so I'll keep adding to this topic.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
08-18-2022, 04:58 AM
#27
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Hi Weblacky,

given that you have a mainboard that is expecting a 700MHZ CPU and a 600MHZ CPU installed, could you possibly see if you manage to boot into POD/CAC mode (ie set L1 debug register 0x10d).

If you can get into POD/CAC then this would confirm that the machine always start with slower clock speed and then based on reading some "register" somewhere then goes into higher operational speed.

If you can get into POD/CAC then you might be able to set the mainboard to a lower speed without having boot into IRIX and use the "flash" comment.

This will help confirm one and for all, if it is possible get board set to lower speed without having to put in a faster CPU.

Cheers from Oz,


jwhat/John.
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
08-18-2022, 10:10 AM
#28
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Hi Jwhat,
I think I already tried this. I can’t be 100% certain because it’s been pretty much a year now. I just took a look back through my saved L1 logs back when I was dealing with the fuel. I am pretty sure that I had specified debug to happen every time I did this but again it’s been a little while.

What happens is the SRAM cache diagnostic fails, which is literally the second or third thing that the PROM does once it’s bootstrapped from L1. After the failure the firmware immediately claims that it’s going to attempt to clear the cache with the power cycle and there’s nothing I can do about it. It then soft-resets and you end up back at the L1 auto power on prompt and then it does it again and again and again in a cycle. So from what I can see when I have the debug switch enabled it will not stop at the POD as it doesn’t get far enough in the code to offer it.

My assumption is that PROM runs on the main CPU because obviously it’s a boots strapping process. Because the cache fails the diagnostic it doesn’t believe it can reliably run any PROM code without a working cache. So my initial thoughts were that such an error is catastrophic to the stable running of the PROM, so there’s no diagnostic to be offered on what would be an unstable platform.

I’m pretty sure this is what I experienced looking back through my notes.

Although I think this is what we expected. I thought the previous claim wasn’t so much CaC mode but was using the primitive L1 memory commands to actually change shadowed from variables before bootstrap?

At least that’s what I thought we were close to before? I know we had already jointly seen the structure that has these variables in an in memory. But nobody had the guts to attempt in memory change of the structure. I understand because I refused do it as well given that I don’t have a sacrificial system.

But I more remember us getting close to simply changing the in-memory representation of the struct that carried the three variables we wanted to change.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
08-18-2022, 10:32 AM
#29
RE: Follow me down the rabbit hole of Fuel PIMM repair.
Hi Weblacky,

yes we saw the structure and I got as far as to find potential POD / CAC command that could write back to memory:

Also as per boot diags: copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x168648

You can read PROM via POD/CAC directly using address: 0x9000000018000000
So it maybe possible that you can also write directly to these memory locations using the corresponding store commands:

>> A 000 001c01: Store byte: sb ADDR [VAL [COUNT]]
>> A 000 001c01: Store half-word: sh ADDR [VAL [COUNT]]
>> A 000 001c01: Store word: sw ADDR [VAL [COUNT]]
>> A 000 001c01: Store double-word: sd ADDR [VAL [COUNT]]
>> A 000 001c01: Store and verify: sdv ADDR VAL

Interestingly looking at your log:

>> Copying PROM code to memory ............... DONE

So it looks like your machines does get past the point of "copying PROM code to memory ...", so maybe your could halt the boot process at that point...

If there was a speed problem, would it even report copying PROM code to memory ??


Cheers from Oz,

jwhat/John
(This post was last modified: 08-18-2022, 12:39 PM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
08-18-2022, 12:33 PM
#30
RE: Follow me down the rabbit hole of Fuel PIMM repair.
(08-18-2022, 12:33 PM)jwhat Wrote:  Hi Weblacky,

yes we saw the structure and I got as far as to find potential POD / CAC command that could write back to memory:

Also as per boot diags: copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x168648
You can read PROM via POD/CAC directly using address: 0x9000000018000000
So it maybe possible that you can also write directly to these memory locations using the corresponding store commands:

>> A 000 001c01: Store byte: sb ADDR [VAL [COUNT]]
>> A 000 001c01: Store half-word: sh ADDR [VAL [COUNT]]
>> A 000 001c01: Store word: sw ADDR [VAL [COUNT]]
>> A 000 001c01: Store double-word: sd ADDR [VAL [COUNT]]
>> A 000 001c01: Store and verify: sdv ADDR VAL

I gather based on your logs that you do not even get the point of getting "copying PROM code to memory ..." as it fails as soon as it tries to access memory...

If it is the case that your problem really is due to speed mismatch, then it is 100% confirmed that you cannot change speed except by putting in faster CPU module, as per Hamei's assertion way back when..

Cheers from Oz,

jwhat/John


Hi Jwhat,
I do get to the copying PROM to memory line. The cache failure happens on the next line right after it (testing cache) so it’s not really started, I guess.  

Code:
IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
Running in DDR mode
Testing/Initializing memory ...............        DONE
Copying PROM code to memory ...............        DONE



SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa8000000000037f0 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 70  5555555555555555 0000000000000000  155
   Received     : 70  5555557555555555 0000000000000000  155
   Syndrome     : 70  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003ab0 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 30  cccccccccccccccc 0000000000000000  0cc
   Received     : 30  ccccdceccccccccc 0000000000000000  0cc
   Syndrome     : 30  0000102000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<108>
       Asterix R14K   CPU  C3D1 [Pin AC27]   SRAM C8E6  [Pin K2]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003790 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 10  0f0f0f0f0f0f0f0f 0000000000000000  30f
   Received     : 10  0f0f0f0f4f0f0f0f 0000000000000000  30f
   Syndrome     : 10  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base Address: DATA
   Failure      : Brother Double Word Not Zero
   Address      : 0x0000000000000008 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  0000000000000000 5555555555555555  155
   Received     : 00  0000000040000000 5555555555555555  155
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  eaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  4000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]
     SCData<126>
       Asterix R14K   CPU  C3D1 [Pin AK30]   SRAM C8E6  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000002000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  aaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000001000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  eaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  4000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]
     SCData<126>
       Asterix R14K   CPU  C3D1 [Pin AK30]   SRAM C8E6  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base + Walk/Inv: DATA
   Failure      : Brother Double Word Not Zero
   Address      : 0x0000000000000009 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  0000000000000000 0000000000000000  000
   Received     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Short March: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000100 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  555555d555555555 0000000000000000  155
   Syndrome     : 00  0000008000000000 0000000000000000  000

   Failing Bits
     SCData<103>
       Asterix R14K   CPU  C3D1 [Pin AD30]   SRAM C8E6  [Pin H3]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003810 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 10  5555555555555555 0000000000000000  155
   Received     : 10  55555d7555555555 0000000000000000  155
   Syndrome     : 10  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000002000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  5555557555555555 0000000000000000  155
   Syndrome     : 00  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000001000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000800 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555df555555555 0000000000000000  155
   Syndrome     : 00  000008a000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<103>
       Asterix R14K   CPU  C3D1 [Pin AD30]   SRAM C8E6  [Pin H3]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000400 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base + Walk/Inv: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000001 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  5555557555555555 0000000000000000  155
   Syndrome     : 00  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Short March: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000001 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCDat
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
08-18-2022, 12:44 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)