sgi tezro L1 General Exception on node 0
#21
RE: sgi tezro L1 General Exception on node 0
(06-02-2022, 07:02 PM)weblacky Wrote:  Can you show me both your L1 flash blocks?

Very good idea, how can I do that and how can I change to the other flash block?
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-02-2022, 08:46 PM
#22
RE: sgi tezro L1 General Exception on node 0
printed page #52: https://irix7.com/techpubs/007-3938-001.pdf


Cmd: flash status

When you do L1 cmds for reboot or default you can often say A or B at the end which tell WHICH firmware you want to boot. Supposedly, in the event of an issue, the L1 will AUTOMATICALLY switch to the other firmware (which should always be a different version) should the L1 crash or fail to start...you're not suppose to be able to get "stuck" in an L1 firmware (assuming you have valid firmwares).

However I've never heard of a way to upgrade the firmware WITHOUT running Irix. That said, when upgrading CPU boards, same problem, but bigger. If you don't have two valid firmware image that support your CPU board...you're trapped until you downgrade the CPU board just to flash your L1 firmwares.

There is a series of files and steps to step-wise upgrade your L1. But version 1.30 was released with Irix 6.5.25. It SHOULD be okay to run with what you have, but check of perchance you have a NEWER version of the L1 in the opposite slot...that might explain things...

http://archive.irixnet.org/apocrypha/nek...376/1.html

#########################

I am running out of ideas here if you don't have an alternate/valid firmware L1 image for version 1.38.4 that SHOULD HAVE BEEN AUTO-INSTALLED when the Irix 6.5.28f installer was run on your system. If that firmware doesn't "magically" resolve this issue then I only know two more options. Everything else requires Irix (like PROM upgrades).

Option 1: Start taking out your RAM (POWERED OFF - UNPLUGGED FROM WALL, unplug and properly installing only a SINGLE PAIR (in the correct locations, consult the manual) and see if the error address MOVES or disappears completely and we're really looking at the memory error.

Option 2. Totally NUKE the hardware inventory list/status to reset the ENTIRE mainboard and re-detect all hardware....done like this (I think):

L1_CMD> pod

POD>go cac
POD>clearalllogs
POD>initalllogs
POD>flush
POD>reset

System, will restart....

No warranties in this statement...but I've never seen this makes things worse...

Good luck.
(This post was last modified: 06-02-2022, 09:19 PM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-02-2022, 09:07 PM
#23
RE: sgi tezro L1 General Exception on node 0
(06-02-2022, 09:07 PM)weblacky Wrote:  printed page #52: https://irix7.com/techpubs/007-3938-001.pdf


Cmd: flash status

When you do L1 cmds for reboot or default you can often say A or B at the end which tell WHICH firmware you want to boot.  Supposedly, in the event of an issue, the L1 will AUTOMATICALLY switch to the other firmware (which should always be a different version) should the L1 crash or fail to start...you're not suppose to be able to get "stuck" in an L1 firmware (assuming you have valid firmwares).

However I've never heard of a way to upgrade the firmware WITHOUT running Irix.  That said, when upgrading CPU boards, same problem, but bigger.  If you don't have two valid firmware image that support your CPU board...you're trapped until you downgrade the CPU board just to flash your L1 firmwares.

There is a series of files and steps to step-wise upgrade your L1.  But version 1.30 was released with Irix 6.5.25.  It SHOULD be okay to run with what you have, but check of perchance you have a NEWER version of the L1 in the opposite slot...that might explain things...

http://archive.irixnet.org/apocrypha/nek...376/1.html

#########################

I am running out of ideas here if you don't have an alternate/valid firmware L1 image for version 1.38.4 that SHOULD HAVE BEEN AUTO-INSTALLED when the Irix 6.5.28f installer was run on your system.  If that firmware doesn't "magically" resolve this issue then I only know two more options.  Everything else requires Irix (like PROM upgrades).

Option 1: Start taking out your RAM (POWERED OFF - UNPLUGGED FROM WALL, unplug and properly installing only a SINGLE PAIR (in the correct locations, consult the manual) and see if the error address MOVES or disappears completely and we're really looking at the memory error.

Option 2.  Totally NUKE the hardware inventory list/status to reset the ENTIRE mainboard and re-detect all hardware....done like this (I think):

L1_CMD> pod

POD>go cac
POD>clearalllogs
POD>initalllogs
POD>flush
POD>reset 

System, will restart....

No warranties in this statement...but I've never seen this makes things worse...

Good luck.

Thank you very much, but it wasn't the l1 version!
I have:

Code:
Image         Status        Revision    Built
--------   -------------   ----------   -----
  A        default         1.30.11      07/16/2004 07:53:59 
  B        valid           1.24.11      10/29/2003 00:05:40

Will check the other tips asap.
(This post was last modified: 06-02-2022, 09:29 PM by HarryT.)
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-02-2022, 09:27 PM
#24
RE: sgi tezro L1 General Exception on node 0
(06-02-2022, 09:07 PM)weblacky Wrote:  firmware L1 image for version 1.38.4 that SHOULD HAVE BEEN AUTO-INSTALLED when the Irix 6.5.28f installer was run on your system.

Hmmm, I don't think so. Main system firmware (PROM) is automatically flashed with OS installations, but L1 always has to be done manually.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
06-02-2022, 09:36 PM
#25
RE: sgi tezro L1 General Exception on node 0
(06-02-2022, 09:36 PM)jan-jaap Wrote:  
(06-02-2022, 09:07 PM)weblacky Wrote:  firmware L1 image for version 1.38.4 that SHOULD HAVE BEEN AUTO-INSTALLED when the Irix 6.5.28f installer was run on your system.

Hmmm, I don't think so. Main system firmware (PROM) is automatically flashed with OS installations, but L1 always has to be done manually.

Did I confuse PROM and L1 firmware? I guess I did...you're right.

Any input on this problem here?  I'm nearly tapped out, he's not done a clearall yet...but what else do we have for a general exception and an non-running CPU?
(This post was last modified: 06-02-2022, 09:43 PM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-02-2022, 09:39 PM
#26
RE: sgi tezro L1 General Exception on node 0
Hi Weblacky and HarryT,

I have not commented on this thread as I have never had "General Exception Error" just the "*** TLB Refill Exception on node 0" one.

Given that that you have played replaced both Snaphat & DALLAS, before proceeding can you confirm that L1 config looks ok:
- Get L1 prompt and check: "date", "serial", "serial all" & "conf" .
- Then on the L1 check log: "log".

To get into POD/CAC mode you have to be able to power up the machine.

Typically this is done by first setting L1 "debug" flags "debug 0x10d" and then doing "power up".

The machine will boot into POD mode, but if "power up" fails then that will not work.

To get machine to then boot normally you need to reset the debug flag (via L1) back to 0 (zero).

If playing with L1 does not help, then you need to revisit Power Supply (as per Weblacky) or borrow an IP53 board from someone and retry with this.

I have had cases where my working SGI machine failed to reboot for no apparent reason (ie no physical/software change on my part) because there was some undiagnosed component failure that is just a result of these being old machines.

Btw I think it might be possible to update L1 via serial port if you are using another SGI to provide the serial connection and direct the L1 flash command to use the serial port (from the SGI machine you are connected with), but I have not tried this.
I do not think this is likely needed in any case as your L1 appears to be working ok.

Finally can you power up via the front power button ?

I hope you can get the machine going again.

Cheers from Oz,

jwhat/John.
(This post was last modified: 06-03-2022, 02:18 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
06-03-2022, 02:06 AM
#27
RE: sgi tezro L1 General Exception on node 0
(06-03-2022, 02:06 AM)jwhat Wrote:  Hi Weblacky and HarryT,

I have not commented on this thread as I have never had "General Exception Error" just the "*** TLB Refill Exception on node 0" one.

Given that that you have played replaced both Snaphat & DALLAS, before proceeding can you confirm that L1 config looks ok:
- Get L1 prompt and check: "date",  "serial", "serial all" & "conf" .
- Then on the L1 check log: "log".

To get into POD/CAC mode you have to be able to power up the machine.

Typically this is done by first setting L1 "debug" flags "debug 0x10d" and then doing "power up".

The machine will boot into POD mode, but if "power up" fails then that will not work.

To get machine to then boot normally you need to reset the debug flag (via L1) back to 0 (zero).

If playing with L1 does not help, then you need to revisit Power Supply (as per Weblacky) or borrow an IP53 board from someone and retry with this.

I have had cases where my working SGI machine failed to reboot for no apparent reason (ie no physical/software change on my part) because there was some undiagnosed component failure that is just a result of these being old machines.

I hope you can get the machine going again.

Cheers from Oz,

jwhat/John.

Yo Jwhat,
In the thread HarryT said he saw his full serial when using "serial all" that matched what was on his case label.

I don't see a geographic location listed on HarryT's profile, but if he's in the USA they may be hope for getting a test PSU in the future.

The Ctrl+D console isn't running at all (which I assume is the PROM code?).  
I though POD worked below the PROM, on my Fuel I got to POD on second stage power-on (fan running and working ENV) but before PROM loads...I thought?

Outside the complete reset, yeah I'm not too certain here...the error has no information whatsoever.  If I'm remembering his logs were unremarkable.

Your symptoms KIND OF match this old recovered thread: http://archive.irixnet.org/apocrypha/nek...awa/1.html

I'm nearly 100% sure you CAN get into debug mode with your system as it bypasses CPU and DIMM checks...I thought.

HarryT claimed he hasn't moved his Tezro, hasn't upgraded anything, used it last Christmas and then now it's acting this way.

I really was thinking...disabled CPU...but his CPU command CLAIMS no CPUs have been disabled.

Regardless, I agree with jwhat, if a "full reset" doesn't "magically" fix it...there must be a hardware reason that will require some parts swapping.

HarryT, if you're up for it...why not just try the FULL board reset.

Enter DEBUG mode on the L1 (before power-up...really fast typing) with : debug 0x10d

pwr up

<TRY Ctrl+D now> if you see A 000 001c01: POD SysCt Cac> 

go cac
clearalllogs
y
initalllogs
flush
debug 0

<Crtl+T> back to L1 command prompt.
reset

This is the trick we know to fix a firmware confused late-model SGI.  If it's not a firmware issue and it's a hardware issue...well then that's a more expansive problem since there is 0 documentation in those regards.  Your power ENV generally looked fine, temps looked realistic, fan speeds as well.  I don't have any rational explanation...see if the reset trick works for you.  if it doesn't then I guess the next steps are either getting/borrowing a CHEAP CPU board from someone to test on your system or a new mainboard L1 lives on mainboard...I assume PROM too?
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-03-2022, 02:39 AM
#28
RE: sgi tezro L1 General Exception on node 0
Hi Weblacky,

yes you right with the debug flag settings it changes the boot process and by-passes diagnostics.

Problem is that you still need to power up to get boot going.

The debug flag 0x10d makes it:
- not do diagnostics on CPU and by passes any CPU disablements
- does not got into prom
- terminates the boot process early and lands you into POD mode

Not sure about Tezro, but as we know from experience, when you do POD/CAC reset on Fuel, you no longer get any output on the external serial port and have to plug into the on system board serial port or L2 USB port. I guess that as Tezro is essentially a O350 with interface board changed to allow for Desktop case, this would not be case, as it does not have the equivalent on system board serial like the Fuel.

I have done POD/CAC resets on O350s without any issues (once I put back IP53 board into machine that failed when I put an IP59 board in).

If the problem is PSU then we call know who to call ;-)

Cheers from Oz,

jwhat/John
(This post was last modified: 06-03-2022, 05:12 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
06-03-2022, 05:12 AM
#29
RE: sgi tezro L1 General Exception on node 0
(06-03-2022, 05:12 AM)jwhat Wrote:  Hi Weblacky,

yes you right with the debug flag settings it changes the boot process and by-passes diagnostics.

Problem is that you still need to power up to get boot going.

The debug flag 0x10d makes it:
- not do diagnostics on CPU and by passes any CPU disablements
- does not got into prom
- terminates the boot process early and lands you into POD mode

Not sure about Tezro, but as we know from experience, when you do POD/CAC reset on Fuel, you no longer get any output on the external serial port and have to plug into the on system board serial port or L2 USB port. I guess that as Tezro is essentially a O350 with interface board changed to allow for Desktop case, this would not be case, as it does not have the equivalent on system board serial like the Fuel.

I have done POD/CAC resets on O350s without any issues (once I put back IP53 board into machine that failed when I put an IP59 board in).

If the problem is PSU then we call know who to call ;-)

Cheers from Oz,

jwhat/John



I'm very sorry, but I do not get into "A 000 001c01: POD SysCt Cac"



   




how do I get into it?




   

(the system is located in Germany btw.)
(This post was last modified: 06-03-2022, 02:41 PM by HarryT.)
HarryT
tezro

Trade Count: (0)
Posts: 70
Threads: 18
Joined: Oct 2018
Find Reply
06-03-2022, 02:38 PM
#30
RE: sgi tezro L1 General Exception on node 0
Please post the output of the cmd: leds

This may give diagnostic boot info as to why the system stopped.

Also is it worth having a look at the other serial ports? The initial “ no response from cpu0” error before the general exception about the node 0 CPU is exactly the same if the console output has been redirected to another serial port. So that error is consistent with having moved the console port artificially. Which in a discreet flame set up is real possibility.

Just for yucks please try attaching the same serial cable you’re already using for your L1 and move it to serial port one (make sure your terminal emulator is connected and running first) then plug in the Tezro power cable and see if you get any output later in the booting process. Do the same again for serial Port two.

Do not try to hot swap the serial ports, that won’t work!

You have to be latched and connected with your terminal to the port before power on. If you try to switch serial ports while the system is running you will likely not get any output as I personally have seen on similar platforms. If you’re not connected to it boot it will not output anything on the other terminal.

This whole thing seems incredibly close to this posting but unfortunately it was never resolved: https://forums.irixnet.org/thread-602.html

If you have enough cables and laptops and or other machines you’re welcome to connect one null modem cable to the console port and one null modem cable to serial one and turn on the hyper terminal or whatever emulations you’re using and then start the machine. If you happen to own multiple cables.

I had to do this on a fuel I recently repaired.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-03-2022, 03:22 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)