sgi tezro L1 General Exception on node 0
#31
RE: sgi tezro L1 General Exception on node 0
(06-03-2022, 03:22 PM)weblacky Wrote:  Please post the output of the cmd: leds

Code:
01c01-L1>leds
CPU  A: 0x2a: PLED_MAKESTACK
CPU  B: 0x5d: PLED_IORESET
CPU  C: 0x5d: PLED_IORESET
CPU  D: 0x5d: PLED_IORESET

As far as I understand, do I need a second null modem cable connected to a second PC? ... checking Serial Port 1 and Serial Port 2 by connecting the same nullmodem cable to the open putty console does not show anything after pushing the power button of the tezro.
(This post was last modified: 06-03-2022, 06:29 PM by HarryT.)
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-03-2022, 06:18 PM
#32
RE: sgi tezro L1 General Exception on node 0
Yeah, it uses the same cable as is the exact same signal. Make sure you actually hooking up to the serial 1 port then open your terminal and go off-hook and THEN turn the Tezro on.

The L1 terminal signal will recover mid-talk with plugging and unplugging an active terminal signal. The console output will not! You have to be fully hooked and paying attention right at power on, if you attempt to attach later, you’ll see nothing on the additional serial port (if the output has been actually moved).

Did you make sure to do this? Fully connect on the terminal side and then plug the Tezro into the wall. Do not put the serial 1 cable on while l1 is running already… too late. You have to be hooked up and signaling long before L1 even starts.


At least I had to do this on my fuel when it was redirected to serial 1 or I saw nothing. Had to have it all started and blinking at the terminal before power was applied to system…so I assume the same is true for Tezro.


Please make sure that’s what you’ve done.


If you’ve done that and you’ve still not gotten output once your system had tried to auto-start (no output if you’ve disabled auto start of course).

I assume you hooked up your Tezro to serial1, started the terminal software, gone off hook on the terminal (connect), plugged in your Tezro to power cable, and pressed the on button at front face to actually attempt power up.


Then wait about 30 seconds and if the console was going to appear, it would.


Also I forgot to ask, does your Tezro front LCD say anything interesting during power up attempts?
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-03-2022, 07:01 PM
#33
RE: sgi tezro L1 General Exception on node 0
(06-03-2022, 07:01 PM)weblacky Wrote:  Yeah, it uses the same cable as is the exact same signal.  Make sure you actually hooking up to the serial 1 port then open your terminal and go off-hook and THEN turn the Tezro on.

The L1 terminal signal will recover mid-talk with plugging and unplugging an active terminal signal.  The console output will not!  You have to be fully hooked and paying attention right at power on, if you attempt to attach later, you’ll see nothing on the additional serial port (if the output has been actually moved). 

Did you make sure to do this?  Fully connect on the terminal side and then plug the Tezro into the wall.  Do not put the serial 1 cable on while l1 is running already… too late.  You have to be hooked up and signaling long before L1 even starts. 


At least I had to do this on my fuel when it was redirected to serial 1 or I saw nothing. Had to have it all started and blinking at the terminal before power was applied to system…so I assume the same is true for Tezro. 


Please make sure that’s what you’ve done.


If you’ve done that and you’ve still not gotten output once your system had tried to auto-start (no output if you’ve disabled auto start of course). 

I assume you hooked up your Tezro to serial1, started the terminal software, gone off hook on the terminal (connect), plugged in your Tezro to power cable, and pressed the on button at front face to actually attempt power up. 


Then wait about 30 seconds and if the console was going to appear, it would.


Also I forgot to ask, does your Tezro front LCD say anything interesting during power up attempts?
   

Ok, I unplugged the tezro powercord, change the nullmodemcable plug from console port tezro to serial 1 port tezro (or serial 2 port), start putty with 38400 baud rate (same settings as for console l1 connection), press enter for putty, plug in power cord tezro, push power button tezro. Will get
Code:
001c01
Powered up
on lcd display tezro (like always).

But nothing happens in putty - sorry. Not on serial 1 nor on serial 2 port of the tezro.  There is no output.

I think this works only on Fuel not on tezro.
(This post was last modified: 06-03-2022, 09:11 PM by HarryT.)
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-03-2022, 09:09 PM
#34
RE: sgi tezro L1 General Exception on node 0
Hi HarryT & Weblacky,

ok I see you tried to boot into POD/CAC mode, but the crash happened prior to getting to that point (hence why you did not see the prompt).

So my conclusion is that you are unlikely to fix the problem via SW / L1 tweeking.

I would assume you have some type of HW problem.

You need a second known good Tezro (or even an O350 to start with) to make this easy, so if you cannot find local SGI enthusiast nearby in Germany, see if you can find someone who you can ship boards to, to test.

IF you had second Tezro/O350 available, I would start by:
1. Ensuring second Tezro/O350 has final L1 installed
2. Swap your IP53 board into the second Tezro/O350 and see if it works there
3. If not then problem solved (or rather known...)
4. If it does work, then you could have failure on: PSU, Interface Board , Graphics .. so need to check each in turn.

Given where you are at in pin-pointing the problem, I believe you need to go to high level HW diagnostic process.

Cheers from Oz,


jwhat/John
(This post was last modified: 06-04-2022, 04:11 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
06-03-2022, 11:55 PM
#35
RE: sgi tezro L1 General Exception on node 0
As I mentioned before: check the Meg Array connectors on your node board! (Try reseating the node board!)

[Image: FQZW4JT.png]

That red light indicates a node board fault!
(This post was last modified: 06-04-2022, 08:47 AM by Irinikus.)
Irinikus
Hardware Connoisseur

Trade Count: (0)
Posts: 3,475
Threads: 319
Joined: Dec 2017
Location: South Africa
Website Find Reply
06-04-2022, 08:37 AM
#36
RE: sgi tezro L1 General Exception on node 0
I hear you Irinikus,

Thanks for pointing that out.

Agree, given no second Tezro/O350 is available “reseat” is worth trying.

Cheers,

jwhat/John
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
06-04-2022, 09:20 AM
#37
RE: sgi tezro L1 General Exception on node 0
Yeah, at this point, I’m not even going to recommend a firmware swap to see what happens. The oddity of this seems to be a hardware/power issue.

I can say this for SGI design, the voltage sensors do have a slight flaw in the design (from a board repair standpoint, not an operational standpoint) in that most of the low-voltage power rails are sprinkled with resistors to limit current from a massive short doing huge damage. If there was a shorted cap or the like in your Node board which could be dragging down “local voltage” but be current limited and thereby a branch component receives low to nothing voltage while upstream components on the same rail register fine.

The Fuel PIMM is very much designed this way, so was the V10 I worked on. Since Tezro’s the same architecture, I’d assume it’s designed similarly?

this scheme allows a local short to hide on a voltage rail and not really be easily detected. It shows up gangbusters as the small resistors heating up on the thermal imager but the components in question often just sit there and stay cool as a cucumber because they haven’t been overloaded. This form of branch circuitry protection makes simple power injection troubleshooting much more difficult. As you have to jump past resistors you find and keep following the short until you’re finally behind the last current limiting component.

I’m going to go out on a very long limb here, just a guess, that since the system was described as working last Christmas, never moved, never upgraded during this time period, and now experiences this problem leads me to believe that it has to be some form of localized component failure.

Is this component a simple passive or is this a actual major component like a complex semiconductor or especially programmed IC? Unknown at this time.

And this will be a new one for our group because I used to scour Nekochan pretty hard and I don’t remember anyone describing anything like this.

While I hope this is not a window of things to come it’s obviously showing some new faults that will need to be investigated. The problem is one of getting sacrificial spares to even investigate.

Tezro mainboards and node boards are not cheap. While I have found very cheap node board that should work (I recently got a dual 600 MHz node board from a origin, that was cheap that I was hoping would work in Tezro as a test board) Due to what the systems are there just isn’t a source of cheap enough parts to just start slapping things together willy-nilly.

Jwhat’s comment probably rings the truest. If you had another system you could take very delicate voltage measurements around the board attempting to figure out if there really is a localized power issue due to a failed capacitor or other passive component.

But I think I threw the software book at this thing considering it can’t run Irix, and we just keep getting back with seems like a dead component.

I failed to understand how the software can say that it magically has four working CPUs and yet the console doesn’t respond.

I really just don’t know, my brain wants to say investigate local power failures first because these boards are normally pretty darn robust. At least that’s been their history thus far.

I feel bad but I’m tapped out of easy ideas here. And I agree that perhaps getting a cheap node board with the lowest end processor you can get and very gently swapping out the board and seeing if it just magically powers up would localize the problem. Then again it may have the exact same problem which obviously then points to the main board and not the node board. Although the main board is a lot cheaper than any node board at retail prices.

Also since this may be a working flame set up changing the motherboard would require being able to change the serial number which I don’t know if that’s as easy as it used to be or not and the whole relicensing issue. I thought that Tezro and fuel store their serial on a chip on the mainboard. On fuel I know the chip is socketed perhaps it is on Tezro as well? I’ve never heard of somebody swapping a main board on a Tezro and doing the proper procedure to preserve the serial number for software licensing.

This is just a bunch of guessing on my part.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-04-2022, 06:07 PM
#38
RE: sgi tezro L1 General Exception on node 0
If the power supply designers had an iota of goodwill toward their community of users they'd release the circuit diagrams so people like us would have an easier time keeping these things going. They aren't making these anymore, and the technology (if such a thing can be said about mere power supplies) is coming up on 20 years old, and that's for the newest of the bunch, so it's not like they'd be losing any intellectual property...

Project: Temporarily lost at sea
Plan: World domination! Or something...
vishnu
Tezro, Octane2, 2 x Onyx4

Trade Count: (0)
Posts: 1,247
Threads: 42
Joined: Dec 2017
Location: Minneapolis, Minnesota USA
Find Reply
06-04-2022, 07:37 PM
#39
RE: sgi tezro L1 General Exception on node 0
Hi Weblacky,

the disconnect on: "how the software can say that it magically has four working CPUs and yet the console doesn’t respond", is that one lot of information is coming from the L1 and the other is coming from main machine.

On Serial for all of Fuel, Tezro & O350 this comes from the DALLAS DS1742 (on interface board, not CPU board) which is also where the L1 stores it data and time.

We also know that if you have un-initialised DALLAS in Fuel , then it will auto initialise with Ethernet MAC and O350 get its HW serial (which is over-riddden by SW one) from PROM on the interface board (I am sure there is thread on this, but can't put in hand on it right now).

Cheers from Oz,


jwhat/John
(This post was last modified: 06-05-2022, 01:05 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
06-05-2022, 12:58 AM
#40
RE: sgi tezro L1 General Exception on node 0
(06-04-2022, 08:37 AM)Irinikus Wrote:  As I mentioned before: check the Meg Array connectors on your node board! (Try reseating the node board!)

Ok, thank's for the tip, how can I reseat the IP53 System Node Board? Just unscrew all the torx screws and trying to move it out straight to me? Anything to know about it before?
(This post was last modified: 06-09-2022, 08:23 PM by HarryT.)
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-09-2022, 08:23 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)