NVRAM Checksum Issue (Solved) -
CB_HK - 10-15-2018
UPDATE
I have solved my issue with the Dallas RTC inside my Crimson by disconnecting the internal lithium batteries and attaching a new coin cell battery holder with leads. In total I spent around one hour working on this project including setup and tear down. I am happy to report that it worked perfectly and I have now solved the NVRAM error that was popping up during POST.
Below is a quick walk-through of how I modified the Dallas chip to function with an external battery. As a disclaimer, I did not create this process. I know many people have already had experience with this issue and with that in mind I spent a few days reading up on the best way to solve this problem. In the end I referenced a YouTube video by a gentlemen who was dealing with the exact same issue (
Link).
The Dallas RTC inside the Crimson is a bit different from the more common models. This one is a DS1216 and is designed to hold the SRAM chip with environment parameters on top. A variant of this chip used in other computers is for ROM instead of SRAM.
Here you can see the SRAM removed from the RTC, with the DS1216 IC in view. The pins on the bottom of the RTC package are fragile and easily bent.
The first modification to make involves using a dremel with a fine tip engraving bit to bore small holes near pins 8 and 14. The goal is to sever the thin connection bars that link the lithium batteries that are potted into the chip so that when you wire up the coin cell battery you don't run the risk of damage.
Once the batteries are disconnected, flip the unit over and use the same engraving bit to cut away a small amount of the plastic frame covering pins 4 and 8 (left to right). Pin 4 will be the positive hookup and pin 8 will be the ground. I recommend using the smallest point on your soldering iron in order to then tin the pads.
Next I connected the two leads to pins 4 and 8.
While there are other, smaller battery holders, the IO3B has a nice empty spot near the front where this Adafruit battery holder can sit.
Once the connections are complete for the IC and the battery holder, carefully reinstall the RTC and SRAM. I used Kapton tape in a few areas to ensure wires were held in place or that any exposed solder joints were insulated.
Below you can see where I mounted the battery holder and how I tacked down the wire. There's not a lot of room between the IO3B and the IP17 boards when mounted in the chassis so I mounted the battery holder flush to the PCB.
Everything reinstalled and voila! No more NVRAM error AFTER I reset the environment parameters. The first time I booted it gave me a parameter error (which was expected). Next up: solve my SCSI controller issue!
Original Post
I decided to set up a serial console with my Crimson to poke around a bit and see how things were going. The real goal was to make sure my console setup was working correctly before I attempted to hook up to a VGXT board set and make sure it's operating correctly. While the VGXT passed all of its power-on tests, I noticed that the Crimson was outputting an NVRAM error to the serial console but not to the main display.
Any thoughts on what the issue might be? I attempted to fix the issue with the Command Monitor but no luck yet.
Code:
Non-volatile RAM checksum is incorrect:
Initializing the non-volatile RAM parameters.
From the Command Monitor, check with printenv and fix with 'setenv ENV_VAR'
where ENV-VAR is a non-volatile environment variable such as bootmode.
RE: NVRAM Checksum Issue -
jan-jaap - 10-15-2018
(10-15-2018, 03:56 AM)CB_HK Wrote: I decided to set up a serial console with my Crimson to poke around a bit and see how things were going. The real goal was to make sure my console setup was working correctly before I attempted to hook up to a VGXT board set and make sure it's operating correctly. While the VGXT passed all of its power-on tests, I noticed that the Crimson was outputting an NVRAM error to the serial console but not to the main display.
Any thoughts on what the issue might be? I attempted to fix the issue with the Command Monitor but no luck yet.
Code:
Non-volatile RAM checksum is incorrect:
Initializing the non-volatile RAM parameters.
From the Command Monitor, check with printenv and fix with 'setenv ENV_VAR'
where ENV-VAR is a non-volatile environment variable such as bootmode.
The battery of the NVRAM is empty. On most SGI systems the NVRAM and the RTC are in a single package, a Dallas timekeeper. But on these systems, the configuration is stored in an SRAM with battery backup. It's a sandwich: the battery backup is underneath the SRAM chip. It's on the IO3B board.
I checked many years ago and even then I think the price was ~ $50. My PowerSeries 4D/440 started having the the same problem last year, so I guess eventually I have to do something myself. I think I'll try to dremel the battery pack to attach leads for a new battery.
Unlike most other systems, the PowerSeries / Crimson PROM doesn't know the 'resetenv' command. If you do a 'printenv' you'll probably see garbled values for all variables, this is what it's complaining about. If you post this output I can try to tell you the real values.
RE: NVRAM Checksum Issue -
CB_HK - 10-15-2018
If I’m able to replace the battery, either with a new one or with the dremel technique, will the system automatically refresh the values it needs on boot or will they need to be entered manually another way? (I’m guessing manually since you mentioned deciphering the output of printenv)
RE: NVRAM Checksum Issue -
CB_HK - 10-19-2018
Here’s the ouput I get from ‘printenv’:
(I pulled the graphics since the system didn’t want to talk via serial initially. I resolved that issue later. Totally user error there.)
RE: NVRAM Checksum Issue -
jan-jaap - 10-20-2018
(10-19-2018, 05:03 PM)CB_HK Wrote: Here’s the ouput I get from ‘printenv’:
(I pulled the graphics since the system didn’t want to talk via serial initially. I resolved that issue later. Totally user error there.)
![[Image: 6572f938-fa4f-4b3b-acda-eff91ace06c2-ori...fit=bounds]](https://beta-static.photobucket.com/images/s152/SomeCorellianGuy/0/6572f938-fa4f-4b3b-acda-eff91ace06c2-original.jpg?width=1920&height=1080&fit=bounds)
Looks more or less OK, the combination of 'console=g' and no valid graphics board results in 'gfx=dead'. If you set 'console=d' it will start on the serial console port. It will wait for a boot command because 'bootmode=m'. If you set 'bootmode=c' it will autoboot after completing POST.
RE: NVRAM Checksum Issue (Solved) -
CB_HK - 11-01-2018
After researching and modifying the Dallas RTC to accept a coin cell battery, my NVRAM issues are resolved. Thanks for the help!
RE: NVRAM Checksum Issue (Solved) -
jan-jaap - 11-01-2018
Awesome, and thanks for the link and the detailed writeup! I'm afraid I'm going to have to repeat this procedure myself a couple of times...
RE: NVRAM Checksum Issue (Solved) -
mgtremaine - 11-01-2018
Very clean work! Thanks for the post.
-Mike
RE: NVRAM Checksum Issue (Solved) -
jan-jaap - 11-01-2018
(10-15-2018, 03:56 AM)CB_HK Wrote: Next up: solve my SCSI controller issue!
First of all, it should be possible to work around this issue: both SCSI channels are wired to the drive bays. It is possible to have all bays wired to either of the two channels, or split between them. Of course if the failing channel is the primary, then you will have to set PROM variables to boot from SCSI channel 2, ID 1.
If you're determined to fix this, you'll probably have to replace the controller chip. It's a WD93 chip. There's of course the risk that you don't succeed in removing the chip (PCBs are thick and full of ground planes), you destroy the board, etc etc.
There are two WD93 controllers on the IO3B board:
I'm not 100% sure which one is which, but the primary controller is wired via the PowerPath backplane to the drive bays, the secondary via the blue SCSI connector on the front edge of the PCB. You do the guessing ...
NB: the 'ide' diagnostics test (option 3 from the PROM menu) might give more information about what's broken. To boot ide from the menu you may need to install IRIX first, otherwise it's possible to load it manually from the CDROM much like you boot sash. In that case you're looking for the file 'ide.IP17' in a directory /stand.
RE: NVRAM Checksum Issue (Solved) -
CB_HK - 11-01-2018
(11-01-2018, 03:21 PM)jan-jaap Wrote: (10-15-2018, 03:56 AM)CB_HK Wrote: Next up: solve my SCSI controller issue!
First of all, it should be possible to work around this issue: both SCSI channels are wired to the drive bays. It is possible to have all bays wired to either of the two channels, or split between them. Of course if the failing channel is the primary, then you will have to set PROM variables to boot from SCSI channel 2, ID 1.
If you're determined to fix this, you'll probably have to replace the controller chip. It's a WD93 chip. There's of course the risk that you don't succeed in removing the chip (PCBs are thick and full of ground planes), you destroy the board, etc etc.
There are two WD93 controllers on the IO3B board:
![[Image: IO3B_scsi.jpg]](https://www.vdheijden-messerli.net/sgistuff/nekochan/IO3B_scsi.jpg)
I'm not 100% sure which one is which, but the primary controller is wired via the PowerPath backplane to the drive bays, the secondary via the blue SCSI connector on the front edge of the PCB. You do the guessing ...
NB: the 'ide' diagnostics test (option 3 from the PROM menu) might give more information about what's broken. To boot ide from the menu you may need to install IRIX first, otherwise it's possible to load it manually from the CDROM much like you boot sash. In that case you're looking for the file 'ide.IP17' in a directory /stand.
Right now Channel 0 is the primary which, as far as I can tell from the Crimson installation guide, is the backplane. That also makes sense since the current configuration is for Channel 0 to run all the SCSI drives (using a loop back connector instead of terminators in the bottom corner). The IO3B connector is attached to a bulkhead port only. When I reverted this layout and split the drive bays, 2 on Channel 0 and 2 on Channel 1, nothing would show on 1.
I agree that it’s likely the SCSI IC, unfortunately. Also, my board is using AMD controllers, interestingly enough. I was able to find the same model number but it doesn’t have the same last few digits. I’m going to keep looking to see if that is going to be an issue.
If I do replace the chip, do I need to worry about anything else? Or should it be, in theory, a one-for-one swap?