(11-20-2021, 03:40 AM)weblacky Wrote: Not that I know anything, but a bad Fuel PIMM I am looking has cache errors in spades when CPU module is OVER VOLTAGE. I think there may be a relationship (then again, maybe not). If I were you, I'd do a hard look at any test points in the manual to test live voltages on each CPU card to verify operation. Possibly the SCACHE does not run due to lack of sufficient voltage or over voltage.
I wish I had something insightful to say about your case, really.
IIRC, for the Fuel (and MIPS/SGI in general), there's the CPU core speed, the SysAD speed, and the cache speed. The SysAD speed is more or less fixed, the core speed is a multiple of it and the cache speed is the core speed divided by 2.5 or 3 usually.
That's all good and well, but I have the nagging feeling that in the case the dividers are stored in NVRAM on the mainboard. There's got to be a good reason for that, because otherwise it's insane -- with SysAD more or less fixed, and that configuration on the CPU module, basically any CPU could run on any mainboard. We know that's not true :(
OTOH, if the dividers *are* stored in the mainboard and you put a 600MHz module on a mainboard configured for 700MHz, you're hot-clocking your cache by ~17% and cache doesn't like that. Not by more than a few percent for sure.
The same logic is used on bigger iron but there it's usually less of a problem -- there's usually another node to be the boot master and from there you can salvage the mis-configured node, even if it takes half an hour to POST (this can happen on Onyx2, really). Origin 200 can be real pain: if you screw up one node you need to find a second, NUMA-link them and use the working node to salvage then mis-configured one. And then they shrunk the Origin to a single CPU workstation and that's where you're stuck.
I'm pretty sure there must have been L2 software inside SGI to deal with this -- otherwise why bother with that port at all?? But so far nobody figured it out and your best option may be to buy/borrow a 700MHz CPU, if only to downclock your mainboard.
(11-20-2021, 03:40 AM)weblacky Wrote: I assume the CPU card(s) have their own VRMs and take bus voltage and make what they need. I'd test both the input voltage to the CPU card (if the manual tell you where the test points are located) and the VRM output voltage made by the CPU cards. Since you have more than one CPU board installed (I assume from your plural grammar), I'd assume the issue may not internal VRMs but bus-provided power. If you have multiple CPUs on a single board, then I'd worry about internal VRM voltage as well.
In the Challenge/Onyx, there's one (deskside) or multiple (rack) OLS's. Each OLS puts out 48VDC towards the backplane. Most boards (CPU, memory) have their own converters from 48VDC to whatever they need. The IO4 has converters, but depends on a 512 brick on the back of the backplane as well. Power for graphics is supplied by 505 or 303 bricks on the back of the backplane.
I wrote up my experience diagnosing this system
HERE. This assumes you have already digested the chapter about the power system in the
CHALLENGE™/Onyx™ Diagnostic Road Map.