Fuel and TLB Refill Exception
#1
Fuel and TLB Refill Exception
Hello, I have a problem with my Fuel that when it boots after some time being left powered down it hangs on "TLB Refill Exception on node 0" as shown in the L2 output below. It happens during SCSI probe. If I reboot the machine then it boots fine and the problem starts again after being left powered down for some time. I got a tip that it could be a bad Dallas time keeping chip (the bulky black chip by the PCI slots) but I cannot see the connection as I'm not really sure about its function. Could this be the problem and if so could someone explain the connection between the chip and this behavior? Any other suggestions are welcome. Thanks.

Installing PROM Device drivers ............         
Base I/O Ethernet set to /dev/ethernet/ef0
Installing Graphics Console...
graphics install: searching for pipe 0

Walking SCSI Adapter 0, (pci id 1)
1timeout on adapter 0 target 1
  tm0=0xffffc4d2b3f2a559, tm1=0xfffed7de6decac02, timeout=0xb
- 2+ Device Vendor Product:
3+ Device Vendor Product:
4+ Device Vendor Product:
5+ Device Vendor Product:
timeout on adapter 0 target 5
  tm0=0xfffed7de6decac07, tm1=0xfffed7de6deca320, timeout=0xb
6+ Device Vendor Product:

A 000: *** TLB Refill Exception on node 0
A 000: *** EPC: 0xc00000001fc47e58 (0xc00000001fc47e58)
A 000: *** Press ENTER to continue.
A 000: POD IOC3 Dex>
cz7asm
O2

Trade Count: (0)
Posts: 25
Threads: 6
Joined: Feb 2021
Location: Czech Republic
Find Reply
03-12-2021, 12:19 PM
#2
RE: Fuel and TLB Refill Exception
The Fuel has a number of design flaws. One of them is the ENV monitoring chip, that's the Dallas Chip most refer to. It can crash or go bad. There's also a snaphat battery on the board that if it hasn't been replaced it has most likely died. The error you mentioned is a CPU error, but I rarely see CPUs die. I've only seen it once in 8 years, and it was an early serial number 600MHz.

Replacing those and using the ATX PSU adapter to replace the PSU (which tends to fry components as it ages) is the best way to do this.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,239
Threads: 533
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
03-12-2021, 01:49 PM
#3
RE: Fuel and TLB Refill Exception
Well I’d start with the dumb questions first:

What happens if you unplug the power to the hard drive and leave it at the PROM maintenance screen for awhile?

Raion is right that is IS A CPU exception. But you don’t mention any other warnings. I’d agree that stuff like this can be a symptom of poor quality power coming out of the power supply (it’s failing to filter enough ripple current).

Since it’s bashing the SCSI controller as well, I’d also suspect the internal SCSI cable, as that has also gone before. But before you hunt one down, I’d try using an external SCSI drive to run Irix and install to and see if the same thing happens. If so, it’s not the internal cable.

I’d be the first to admit (along with Raion) that any test results will be heavily skewed if your power supply is acting up. Fuel power supplies have a reputation for being very quick to fail due to age (more so than any other SGI). They were obviously not made/tested with aging parts in mind.

So ensuring you have good power is job number one before any other testing. Having a Fuel power supply that’s going out could damage other old components via ripple getting through.

Unless you’ve the experience to rebuild your power supply yourself, it’s best to set it aside for the future and spend the money in the ATX conversion (fully reversible) to get good power.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
03-12-2021, 07:08 PM
#4
RE: Fuel and TLB Refill Exception
Hi cz7asm,

on Chimera (O350) which Fuel is based in it is easy to generate a TBL Exception by putting in a CPU board that the L1 SW version does not support (ie newer CPU module with older L1 SW version).

You do not mention what CPU module or IRIX version you are on.

As a caution I would ensure you have new L1 (from 6.5.30) and PROM SW.

Also you can easily see if you have a battery problem with either the DALLAS or SNAPHAT by:
  • For DALLAS - check date by going into L1 and doing "date" command and to see if this is correct. If not set it and then pull out power and see if setting has held on restart (this is not easy to fix and will require either replacement of entire DALLAS chip or cutting into it to setup new battery on top of it.
  • For SNAPHAT - if you see a "preposterous date/time" message as part of boot then it is likely have have dead snaphat battery. Reset the date via OS and see if it sticks across power removal or not
I found that when the SNAPHAT battery stopped working that sometimes the "preposterous" date would prevent me from being able to log in via UI. So as workaround, I would first boot into single user mode , reset date via OS and the reboot into full UI mode.

I hope this info is helpful.

Cheers from Oz,


jwhat/John.
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
03-13-2021, 03:04 AM
#5
RE: Fuel and TLB Refill Exception
Thank you everyone for all your suggestions.

I already encountered problems with faulty monitoring Dallas chips. Luckily I have good soldering equipment at work and they were not difficult to replace.

From what I read here I think the hottest tips are either a bad PSU not delivering stable voltage or an outdated L1 firmware.

I already did some research on how to do the ATX PSU conversion so that should be quite easy to test out. I know there's the conversion adapter available but I don't like its price given the fact that it's just a rewiring plus a timer to simulate a FAN tacho. Yet I understand it's a very small home made production. However I must find out how to do the L1 and PROM firmware update.

I have a few Fuel machines that were still up until recently doing their job in a local hospital. I already fixed three machines just by replacing the monitoring chips and now they can all boot but there are still some smaller problems that I'm trying to solve. Like this TLB exception that I noticed on two machines. Another one has some occasional problems reading from HDD but I didn't have time to look into it more. There's a fourth machine that seems to have all its monitoring chips ok but when I try to power it up it turns on the PSU and after a second it switches itself off - here I suppose it's something more serious. Well, this is now my spare time activity when I'm in a mood to tinker with this old stuff.

BTW thank you jwhat for elaborating on the time keeping chips. I'm aware of them but I wasn't sure what's their actual role and the possible symptoms when they go bad. Very helpful.
cz7asm
O2

Trade Count: (0)
Posts: 25
Threads: 6
Joined: Feb 2021
Location: Czech Republic
Find Reply
03-13-2021, 12:12 PM
#6
RE: Fuel and TLB Refill Exception
Also be aware that while L1 has a dual firmware system to allow recovery in the case of bad flashing or corrupt firmware, I’m not sure the PROM does.

So if you have a “freak out” due to bad power while flashing...possibly game over. We never found a way to flash either the L1 or the PROM outside these machines for the last platform of SGIs. It looks like it’s not kept in independent flash chips, but likely integrated in some small processors-like chips that aren’t just externally accessible for flashing. It’s unknown if there is a JTAG interface for them, but they are not just SPI flash chips or EPROMs like
the older machines.

Since you have so many machines (lucky) just buy one ATX adapter and move a single high-end PSU among all machines for testing purposes.

When installing Irix the PROM and L1 are usually automatically updated to the versions present on the Irix distribution you’re installing (if version already same or higher, no flash update should occur).

There are certainly discrete instructions for upgrading both systems. But most people never do them, they just update Irix and it naturally happens as part of OS install.

Sometimes PROMs & L1 versions were put out as later patches for a distribution, if there were problems. But I think that’s only happened a few times.

Considering you have Fuels, it makes the most since to simple install the last edition of Irix and use those firmware versions. Yes, there may be one higher version of the L1 for Fuel, but you may not need it.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
03-13-2021, 06:52 PM
#7
RE: Fuel and TLB Refill Exception
Thanks weblacky for additional information. Since I don't have Irix CD-ROM's I did a network installation on them. Is it possible that L1/PROM update doesn't happen during network installation? I'm now stuck in a covid quarantine and the machines are in my office so I cannot check the versions right now.
cz7asm
O2

Trade Count: (0)
Posts: 25
Threads: 6
Joined: Feb 2021
Location: Czech Republic
Find Reply
03-13-2021, 09:27 PM
#8
RE: Fuel and TLB Refill Exception
From what I know you don’t need the CDROMs to have the flash upgrades take place. A normal OS install does this.

To check your PROM version, boot to PROM, stop OS start, enter command monitor, and type: version
That will display your PROM version.

For the L1, go into the L1 command mode and type: flash status

That will display a small table with your A & B firmware blocks and the “revision”. These blocks contain the L1 firmware and previous L1 firmware (backup/downgrade). The table will tell you which Image is default (currently running).

You can post the output of both commands here and we can determine if you’re up to date or not. Likely you’re very close or completely up to date.

But that’s how you check it.

I’ve only heard of one PROM upgrade attempt failure during OS install (I think that was an O2) and the owner simply found a higher Irix version to jump the PROM past the version that wouldn’t apply. Not Fuel related.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
03-13-2021, 09:55 PM
#9
RE: Fuel and TLB Refill Exception
hi cz7asm,

for :
  • L1 commands from IRIX use: l1cmd
  • For PROM flashing use: flash
  • for L1 software update use: flashsc
L1 and PROM updates are not always automatically applied on OS install / update.

I am not sure why this is but people seem to have different experiences with this.

See here for great archive of IRIX versions: https://forums.irixnet.org/thread-2667.html

On DALLAS environmental chip replacement / soldering, did you take any pictures ?

If so it would be great to post some of these to help others withs system board chip replacements.

Cheers from Oz,

jwhat/John
(This post was last modified: 03-14-2021, 04:21 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
03-14-2021, 04:20 AM
#10
RE: Fuel and TLB Refill Exception
Hi jwhat,

regarding the monitoring chip replacement I wrote a short blog post to summarize everything I found in case I need it again in the future.
cz7asm
O2

Trade Count: (0)
Posts: 25
Threads: 6
Joined: Feb 2021
Location: Czech Republic
Find Reply
03-16-2021, 02:48 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)