Okay, hoping you were right, I tried booting without V10 and without Kbd/ms...got the same result I documented. The Fuel physically stopped me because it's missing an I2C element (ERROR: I2C:not present) which by all accounts appears to be the V10's DS1780. You see below those elements are missing once the V10 is removed.
In my previous posts they come back and show up fine when the card is inserted, as I replaced the chip...and the chip is directly feed from the XIO connector (not a VRM on the card). So my assertion is that the graphics card is the last component on the sensor bus and completes the entire thing.
The way I interpret what I’m seeing in my logs (all posted in this thread) is that the graphics card’s DS1780 is charged with monitoring all the XIO voltages (those are all blank when card is removed in env) and when I say “lose communication with bedrock” I specifically see row 5 on the last env temp table say:
5 BEDROCK Wait Pwr Not currently available
this changes to fully filled out when my shorted (now open short) non-starting V10 with replaced DS1780 IC is installed. “Not currently Available” is replaced with temps just like the lines above it.
This doesn’t produce an alert or claim the env monitoring isn’t running, env claims running and configured. But it’s missing info, which I interpret as an error. Because that info is present when the card is inserted.
I'm well aware that i2c is a shared bus (not linear, p2p)...but since the V10 is "missed" and yet the ENV monitoring claims it's up and running without a config error I assume my i2c bus lines are fine (how else could I talk to them all but the missing one when the V10 is removed). I assume you're operating under a bug/feature that SGI closed to prevent people from doing exactly what you're suggesting. My gamble is that without Bedrock temps the system is in a position to overheat and fail in an undetectable way, so SGI likely closed that loophole and I have an actual error that stops me (yet doesn't stop env). Because, while I believe what you're saying, I haven't seen anyone else claim being able to boot headless on a Fuel and I can find at least 3+ postings of people saying what I'm seeing...the fuel will not let you start without the monitoring from the graphics card.
I urge you to please post your pre-power up log with env and power printouts, and see what it shows. If you show exactly what I show (missing XIO voltages and missing bedrock temps)...then likely your old firmware is giving you a loophole feature. My board has the same firmware for both flash slots...so very likely it was produced with this firmware in place.
While I won't pressure you, I'd bet fair money if you actually upgraded your firmware...you'd be sitting right here with me...unable to start without a graphics card.
Log shown below:
Code:
SGI SN1 L1 Controller
Firmware Image B: Rev. 1.28.3, Built 03/20/2004 00:01:57
001a01-L1>INFO: 001a01 will power up system in 5 seconds...
INFO: 001a01 powering up the system.
eERROR: 001a01 auto power up error.
ERROR: command not found.
001a01-L1>env
Environmental monitoring is enabled and running.
Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
12V Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.19
12V IO Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.19
5V Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 0.26
3.3V Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.89
2.5V Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.00
1.5V Wait Pwr 10% 1.35/ 1.65 20% 1.20/ 1.80 0.00
5V aux Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 5.04
3.3V aux Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 3.30
PIMM0 12V bias Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.19
Fuel SRAM Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.29
Fuel CPU Wait Pwr 10% 1.13/ 1.38 20% 1.00/ 1.50 0.03
PIMM0 1.5V Wait Pwr 10% 1.35/ 1.65 20% 1.20/ 1.80 0.00
PIMM0 3.3V aux Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 3.29
PIMM0 5V aux Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 5.02
XIO 12V bias <not present>
XIO 5V <not present>
XIO 2.5V <not present>
XIO 3.3V aux <not present>
Description State Warning RPM Current RPM
-------------- ---------- ----------- -----------
FAN 0 EXHAUST Wait Pwr 920 0
FAN 1 HD Wait Pwr 1560 0
FAN 2 PCI Wait Pwr 1120 0
FAN 3 XIO 1 Wait Pwr 1600 0
FAN 4 XIO 2 Wait Pwr 1600 0
FAN 5 PS Wait Pwr 1349 0
Advisory Critical Fault Current
Description State Temp Temp Temp Temp
----------------- ---------- --------- --------- --------- ---------
0 NODE 0 Wait Pwr [Autofan Control] 75C/167F 11C/ 51F
1 NODE 1 Wait Pwr [Autofan Control] 75C/167F 11C/ 51F
2 NODE 2 Wait Pwr [Autofan Control] 75C/167F 11C/ 51F
3 PIMM Wait Pwr [Autofan Control] 75C/167F 11C/ 51F
4 ODYSSEY <not present>
5 BEDROCK Wait Pwr Not currently available
001a01-L1>reset
Code:
ERROR: power appears off.
Code:
ERROR: I2C:not present
001a01-L1>reboot_l1
SGI SN1 L1 Controller
Firmware Image B: Rev. 1.28.3, Built 03/20/2004 00:01:57
001a01-L1>* poINFO: 001a01 will power up system in 5 seconds...
INFO: 001a01 powering up the system.
wERROR: 001a01 auto power up error.
er
001a01:
Supply State Voltage Margin Value
-------------- ----- --------- ------- -----
12V off 0.188V N/A
12V IO NC 0.188V N/A
5V NC 0.260V N/A
3.3V NC 0.894V normal 0
2.5V off 0.000V normal 0
1.5V NC 0.000V normal 0
5V aux NC 5.044V N/A
3.3V aux NC 3.302V N/A
PIMM0 12V bias NC 0.188V N/A
Fuel SRAM NC 0.286V normal 0
Fuel CPU off 0.028V normal 119
PIMM0 1.5V NC 0.000V normal 0
PIMM0 3.3V aux NC 3.285V N/A
PIMM0 5V aux NC 5.018V N/A
XIO 12V bias <not present>
XIO 5V <not present>
XIO 2.5V <not present>
XIO 3.3V aux <not present>
001a01-L1>reb reboot_l1
SGI SN1 L1 Controller
Firmware Image B: Rev. 1.28.3, Built 03/20/2004 00:01:57
001a01-L1>INFO: 001a01 will power up system in 5 seconds...
INFO: 001a01 powering up the system.
ERROR: 001a01 auto power up error.
001a01-L1>env check
Environmental monitoring is enabled and running.
001a01-L1>flash status
Flash image B currently booted
Image Status Revision Built
----- ------------- ---------- -----
A valid 1.28.3 03/20/2004 00:01:57
B default 1.28.3 03/20/2004 00:01:57
001a01-L1>
My Mainboard is: 030-1707-005 Rev. A
Fingers Crossed on my replacement V10 card!
(11-07-2021, 06:46 PM)Raion Wrote: A lot of this stuff is over my head here guys but my question is if there's a potential software cause to why this doesn't work because I had three fuels and mostly working condition over my time working with them and all of them failed spectacularly in different ways. So I had a lot of time to test with and without graphics. Never was I able to get a system to boot headless.
If the answer is no then I suspect that there might be another point of failure on the board that might be common and potentially caused by the defective power supplies. Just a personal theory though?
There's a lot in this statement, but I've asserted (and still agree) for a while that bad power from PSUs caused many of these failures (not all of course, I do believe your heat argument). Since I found the DS1780 on my V10 is directly connected to the mainboard XIO 5V line...yeah that's pretty much the 5V AUX line...right into PSU. There is current limiting but not conditioning/regulation so it makes total sense that as the PSU turns (and for some reason they TURNED FAST compared to other SGIs) fragile low-voltage devices suffer.
I'm super interested to see inside my PSU (when I get there) to see if it blew something (like a protection diode that prevents versus voltage spikes or something) or if the caps are just WAY out of it.
Since ENV doesn't seem to produce a config error when missing the V10, I assume the later checks where put in place to prevent going forward (without reprogramming ENV) in later firmwares. I know they did things like close the "Carnage" keyword and change how "Security" works (once "ON", behavior to turn it off was changed, right?). So I know they made changes all the time, given customer scenarios.