Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
#1
Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
Yo,
Sometimes for fun I watch 80's computer rebuilds on youtube.  About 1/3rd of the time it turns into a IC swap-fest.  These old chips "go bad".  Most of them are (E)EPROM or ROM related.  While that's not directly what we have in most of the SGIs, I started to wonder about the modern-ish flash used in our new PROMs and L1s (and of course PROMs in our older models as well).  I think the error in logic here from SGI was they would keep making upgrades, so you'd get a new version of the firmware every year or so, which would upgrade your existing firmware, which would refresh your flash and reset the lifespan clock.  Figure 15 years after the last flash and you're in trouble land. 

So now what?  Do you think it's worth finding the commands to forcibly RE-FLASH last versions of our PROMS and L1s (in systems that support it) so we can do it manually (every 8-10 years or so) to force "freshness" via reprogramming the systems?  It seems that most of the older PROMs are in ICs that could be removed, desoldered, and programmed (I need more info on this personally).  But we don't know about the newer systems (at least I don't) like the Octane, Fuel, Tezro. I thought the O2s were somewhat understood and the PROM was on the processor cards, right?  But I don't know if it's field upgradable or not.

Anyway, thoughts?  I cringe that I didn't really think of this before.  But just as we are finding out for Tezro RTC recovery, SGI didn't put any real recovery options or affordance for outright loss (bad firmware on L1, YES via A & B slots) but otherwise...no.

Can anyone get the ball rolling with system info and command line examples to flash the PROM and/or L1 and anything else you can think of for different systems and even if there is a force mode where the flash will rewrite the same version (instead of maybe saying it's not needed)?

I know we are lucky in that most firmware is in the Irix media or OS install, so files are everywhere.  But I don't know how to safely do much of this.  Can anyone please start providing working info on this topic?

I fear with the failing equipment going how it is, PSUs may not be our biggest issue. In another 10 years, will the Indigo/Indy/Indigo2 PROMS start to fail?  Thoughts?

Please post any info you have about PROM images, Burning, flashing, whatever!  I'd like to see a central source this info so we can prepare sooner than later.  I don't want us scrambling when users start flooding in for PROM or firmware related issues.  I'd like to not only have the answers but a refresh regimen in place with maybe even backup chips ready to go in the short term.  In the long term, I'd like an archive we can reference to salvange already-dead systems due to loss of firmware code.

***LIST****
IRIS INDIGO
         2. The Indigo R4000 PROM is a 40 pin M27C4002 4Mbit EPROM located in the center of the board under the GIO32 expansion bracket.

INDY                 

    -> PROM -> The Indy/Challenge S PROM is like the Indigo R4K a 40 pin M27C4002 4Mbit EPROM and is located under the graphics board or in case of the Challenge S, the IO4Plus.

INDIGO2

    -> PROM -> The Indigo2/Challenge M PROM is a 40 pin M27C4002 4Mbit EPROM just like the Indy and Indigo R4K and is located in the GIO64/EISA Interface compartment close to the fan. For the Power Indigo2 (IP26) and the Indigo2 R10000 (IP28) The PROM's are different, but the location of the PROM is the same


O2

->PROM location -> Remove Socketed Dallas RTC and underneath found an Atmel 29C040A 4MB flash chip.  Anyone have dumps already?  Datasheet claims very easy to program.

Code:
sbin/flashinst -T -y /usr/cpu/firmware/ip32prom.image
Octane(2)
Code:
su
/usr/sbin/flash -P /usr/cpu/firmware

You can compare the before and after results using the command. Reboot afterward.

Code:
/usr/sbin/flash -V
Fuel

Tezro
Code:
--Final PROM revision for the Tezro 6.211 is found in the following IRIX patch: Patch_SGI0007149
flash -v -a usr/cpu/firmware/ip35prom.img

-- L1 show current version -
l1cmd --scdev hw/module/001c01/L1/controller flash status

-- (Read version hopping from 1.32.6, to 1.38.4, to last 1.48.1 -> Via SGI Patch_SGI0007149)
flashsc --sc usr/cpu/firmware/sysco/l1.bin 1.1
Origin



Can we got more on other systems please?  Also in regards to this, can anyone discuss retention lifespans when left on a shelf?
(This post was last modified: 10-22-2020, 02:03 AM by weblacky. Edit Reason: formatting keeps corrupting. )
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
10-18-2020, 09:16 AM
#2
RE: Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
(10-18-2020, 09:16 AM)weblacky Wrote:  I think the error in logic here from SGI was they would keep making upgrades, so you'd get a new version of the firmware every year or so, which would upgrade your existing firmware, which would refresh your flash and reset the lifespan clock.  Figure 15 years after the last flash and you're in trouble land.
I suspect the logic was just that the specified retention of the flash was well above the useful life of the machine, and that if a couple of
PROMs failed, they could be replaced under an SGI support contract (like any other component).

(10-18-2020, 09:16 AM)weblacky Wrote:  So now what?  Do you think it's worth finding the commands to forcibly RE-FLASH last versions of our PROMS and L1s (in systems that support it) so we can do it manually (every 8-10 years or so) to force "freshness" via reprogramming the systems?
I'd caution that the data retention period is dependent on how worn the flash is; or more specifically the number of times it's been erased and written to.

Datasheets aren't clear on this, but I think that means that the retention should be good for the quoted number of years (10) at any point within the quoted number of writes (probably 10,000). In practice, you'll probably get significantly more; Octane is likely sitting at 20 years, Personal Iris probably 30 years.

As with various 80s computers becoming a chip-swap fest, I suspect the most helpful thing is knowing how to rewrite the PROM
without a working machine.

Are the flash images on the hard disk raw images we can write with an external programmer? Is the chip also used to hold other data (mac addresses, serials etc.,)? If so, how can we dump everything to a file? Can we recreate any per-system data?
Donjon
O2

Trade Count: (0)
Posts: 8
Threads: 1
Joined: Oct 2020
Location: Europe
Find Reply
10-22-2020, 12:21 AM
#3
RE: Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
Yo,
This is why I phrased the question in this manner. You knew the "expected" 10 year retention period of modern NAND and you knew it needs to be rewritten (NOT READ) for flash to be refreshed. Yes, much of the SGI firmware is on the drive...but that does us little good as you'll see.

For older machines...it's in known IC PROMS (Indigo, Indigo2, O2, Indy)...maybe Octane...anyone know? Didn't someone have a PROM dump collection?

For newer...I thought it was buried inside a co-processor on like the L1 processor itself? We've never heard from anyone that proved where the firmware lives. That's why my question is phrased as reflashing. They only way I know to refresh an aging flash memory is to reflash it with fresh data and fresh charge. So you go through the normal flash routine when the system works, to keep it working.

Data retention is data retention...period. While the number of reads may cause the stored charge to disappear quicker, left alone...on a shelf...with data...the NAND charge will lower and lower (leak) until it's too low to reliably read (how long I don't know). Also we have no proof these chips are smart enough to manage or refresh their own data (move it around or refresh) while powered. So let's assume worst case scenario...they data rot from the inside out (behind a locked door, inside a Atmel chip or something).

So I questioned on the assumption the charge starts degrading after programming...we currently have no idea what or how to physically access, with a programmer, to put back the PROMs and L1s on newer (non-working) SGIs. I hope someone can show me wrong, but I've not heard anyone point to an EPROM on a Tezro and say...firmware is there.

10,000 write is just that, writes. I don't expect the system to survive so long that I run out of write if I'm flashing once every 7-8 years or so. Also we know for a fact SGI was smart enough to place L1 logs on the DALLAS RTCs NVRAM, so it's not writing logs to flash.

So there is the question again, on old systems we just have PROMs (I've only heard of Indigo2, O2, and Indy PROM locations). I don't know about Octanes, and we DEFINITELY have nothing else for Fuel, Tezro, Origin 3X0. The only NICE thing about the the new architecture is for the L1 is have two firmware banks, you could write a script or something that if it's been longer than 7-8 years since last flash, you could forcible flash the alternate firmware slot (over the decades), then if the primary got into trouble, you'd have fresh backup, which you could then reflash once you're back in the OS...so that part COULD be automated if you make sure to boot your collection every 3-4 years or so.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
10-22-2020, 01:47 AM
#4
RE: Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
(10-22-2020, 01:47 AM)weblacky Wrote:  You knew the "expected" 10 year retention period of modern NAND and you knew it needs to be rewritten (NOT READ) for flash to be refreshed.
I feel like the point I was making has been missed: As Flash wears from being written to, the amount of charge that can be held in a cell reduces. The cell has worn out when it can't hold enough charge that it can reliably signal a 0 or 1.



Therefore, a cell that's perhaps been rewritten a handful of times can be expected to have a much longer retention period than the quoted period.



It's likely that the flash used in SGI machines is NOR flash. The datasheets I've found didn't say, but it's more in line with the use case, pinout (for the ones we know) and so on. Happily data retention period is typically higher in NOR flash. For their recent NOR flash products, Atmel (now part of Microchip) cites 'greater than 100 years'.



(10-22-2020, 01:47 AM)weblacky Wrote:  For older machines...it's in known IC PROMS (Indigo, Indigo2, O2, Indy)...maybe Octane...anyone know?  Didn't someone have a PROM dump collection?
There's more programmable devices in SGI systems than that; I know of three in the O2 (not including the RTC), and I've only checked the motherboard module. As well as the flash chip for the PROM, there's also an XC1718 device on the processor module (which I assume is used in configuring (V)ICE) and a DS2502 on the PCI riser.



You can't reprogram the latter two of these: the XC1718 is OTP, and the DS2502 is an 'add only' EPROM.



Which leads to another concern about trying to refresh the PROM - it may not be possible at all. Certainly, it's not possible for any of the machines with a UV EPROM (so, Indigo, Indigo 2 and Indy are out). With the O2, referring to the AT29C040 datasheet, page 3 mentions a 'Boot Block Programming Lockout'. This allows regions of the flash chip to be permanently locked from reprogramming.



The expected use of this is that the machine starts by running code in this locked section. This code checks the integrity of the remaining flash (probably checksum), and if it's OK, jumps to it. If it's not OK, then it takes some corrective action - perhaps blinking an LED to signal an error. This is particularly attractive on systems which have dual flash images; if the newer image is damaged in some way, then the boot process can automatically try the older image.



(10-22-2020, 01:47 AM)weblacky Wrote:  For newer...I thought it was buried inside a co-processor on like the L1 processor itself?  We've never heard from anyone that proved where the firmware lives.
All of the SGIs I've looked at have mostly used off the shelf components, with the custom silicon reserved for where it really matters eg., the processors, graphics and memory interconnect. Outside of those things rolling their own is just going to increase cost. Even in things produced in large volumes with security requirements the flash chips still aren't usually integrated into the processors.



On the Tezro, for instance, looking at photos of the (I think) nodeboard (in this thread) there's a what looks to be a Coldfire MCF5206 (m68k-ish) microprocessor (next to two chips that look like memory. I would posit that is this is the L1, and that one of those chips is RAM and the other flash ROM. They look like standard chips (one looks like it's made by Cypress and the other maybe TI) but unfortunately I cannot read the part numbers.



(10-22-2020, 01:47 AM)weblacky Wrote:  Also we have no proof these chips are smart enough to manage or refresh their own data (move it around or refresh) while powered.  So let's assume worst case scenario...they data rot from the inside out (behind a locked door, inside a Atmel chip or something).
The worst case scenario is surely failure of the entire chip? Even for the partial fails, conventional wisdom is that the chip has degraded and so should be replaced. (Although, as hobbyists we might try rewriting and hope it keeps working)



For that replacement, of course, you need a dump, a compatible replacement IC and a device programmer. Whilst we're considering worst cases, it's not even known if reflashing will refresh all of the critical areas of the flash chips;  and it's very common to intentionally design things so it won't. In addition, getting dumps of all the programmable devices is actually required, because there's known to be devices in SGI machines which cannot be reprogrammed from the command line.



Whilst the number of write cycles aren't a concern, losing power whilst flashing the PROM is a concern. If there's a recovery mode, that recovery mode is going to have to be stored in an area of flash that doesn't get rewritten, and it will be the first bit of code that the processor runs on startup. If there's not a recovery mode, losing power during a PROM update means a dead computer.



In summary, if there's a protected boot block (pretty much a given for machines that can have more than one image, and not unlikely on single image machines) it's a waste of time. If there's not, there's an elevated risk of bricking the machine every time you reflash - which is a pretty major issue if you don't even know where the flash IC is.
Donjon
O2

Trade Count: (0)
Posts: 8
Threads: 1
Joined: Oct 2020
Location: Europe
Find Reply
10-23-2020, 01:38 AM
#5
RE: Should we be worried about flash/ROM programming retention in SGIs for PROMs and L1s?
I'd like to eventually see FOSS firmware replacements for ARCS that make it easier to get processor upgrades and custom upgrade cards for the CPUs still manufactured (or that are in good supply), as well as could allow more modern protocols (i.e. PXE).

In any case, good thread.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,240
Threads: 533
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
10-23-2020, 03:08 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)