Indigo 2 Extreme Graphics SHRAM error
#1
Indigo 2 Extreme Graphics SHRAM error
Hi,
I'm new here and a bit new to the SGI topics. So it is possible that some things can go wrong from misunderstanding. I'm into computers since my childhood (starting with 286) and I'm a Linux user but SGI is a bit different than "normal" PC. I've built up since the last few days 3 Octanes with success (with IRIX 6.5.30). But my Indigo 2 have some problems.
I'm sorry if my english is bad, I'm from Germany and have some problems.


So, today I've got a problem with my Indigo 2. It's a green one with R4400 and extreme graphics.
Yesterday I got it out of my storage and deep cleaned it. It was untouched for years. Well, the case is trash I think. Somewhere, sadly, was some water and rusted the outside of the case. Additionally the harddrive is dead (I'm unsure if I should open it to look what is failing ? It sounds like the head is stuck in some place. The plate itself spins up fine.)
At the moment I run this thing without case. I put an extra Fan to the CPU and got another drive attached to it so I can run the hardware test.

Well, after deep cleaning, the hardware was still in very good condition (no corrosion). It bootet without any problems into the PROM (Is that correct ?) menu, where I can select to do hardware tests, go into the console, start the system and so on. I was extremely happy about that. Until today...

I startet it today, the start sound came but the screen has 4 big stripes, 2 black and 2 white stripes in in alternation, as background. There are a few pixels around that are colored in R, G or B (No mixed colors). The PROM menu doesn't appear.
Coincidentally there is some sort of IRIX on my test harrdrive and it started after some time to boot. In this case the PROM window appears in parts (where text is placed) and shows the progress like networking and so on.

So, then I searched a bit but found nothing about this issue, but I got another solution by accessing the device through serial. It works fine, BUT: if the graphics card is plugged in, there are many error messages about an SHRAM. For example:

ERROR in loading constants**** Shram Location -> 1ff   expected -> 200      actual -> 7f0200

It addresses all regions starting at 1d to 1ff as faulty. At the serial connection the PROM menu doesn't appear too.

If the card is unplugged from the system, there are 2 messages about a missing GPU, the PROM menu appears and everything works fine. I've done until I wrote this message a hardware test and it runs through. Is it correct that there are no messages outputted about success ? It only shows the ide>> prompt, no other messages since it listed the hardware information...

But back to topic, what died ? Where is this SHRAM located ? Is it a region in the RAM modules (So I possibly have a faulty RAM module) or is it located at the mainboard (SHaredRAM as transfer between CPU and GPU) or is it located at the graphics board - and when, is it accessible (there are many RAM chips around in this pcb sandwich) or is it inside a controller ?
And why died the card after it runs fine yesterday ?

Many questions. I hope someone could help.
CommLan
O2

Trade Count: (0)
Posts: 3
Threads: 1
Joined: Jun 2024
Location: Germany
Find Reply
06-13-2024, 02:08 PM
#2
RE: Indigo 2 Extreme Graphics SHRAM error
(06-13-2024, 02:08 PM)CommLan Wrote:  Where is this SHRAM located ?

U31, U32, U518, U519 on the GU1 board (bottom card of the Extreme boardset).

Code:
ERROR in loading constants**** Shram Location -> 1ff   expected -> 200      actual -> 7f0200
means that when the data was read back from this location (and all of the others) it differed in 7 bits (bits <22..16>) being stuck on, as would be expected if a whole 32k x 8 SRAM chip had failed, or if the power supply or enables to that chip were broken. Looks like a component-level repair is warranted. The faulty chip may be located by probing the data pins of these 4 ICs using a scope.

Quote:Somewhere, sadly, was some water and rusted the outside of the case.
It rusted the plastic case?

Quote:Additionally the harddrive is dead (I'm unsure if I should open it to look what is failing ? It sounds like the head is stuck in some place. The plate itself spins up fine.)
Unless you have a Class 100 clean room, it's not advised to open the HDA. Some drives are known for sticky buffers that prevent the head from loading. Is yours made by Quantum? A light tap on the right side of the drive using a screwdriver handle can sometimes allow it to load.

Personaliris O2 Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT Indigo2 Indy   (past: 4D70GT)
robespierre
refector peritus

Trade Count: (0)
Posts: 640
Threads: 3
Joined: Nov 2020
Location: Massholium
Find Reply
06-13-2024, 03:27 PM
#3
RE: Indigo 2 Extreme Graphics SHRAM error
Quote:U31, U32, U518, U519 on the GU1 board (bottom card of the Extreme boardset).
Okay, that seems to fit, found them. Is it a coincidence that these are the only Motorola ram chips on the whole board ? Well, they're rated with 10ms, the others from Toshiba by 15ms. Is it possible to use Toshiba 15ms spares instead or is it intended that the Motorola ones have to be faster ? U33/34 and U520 (U521 is not populated) near them are Toshiba 15ms or is this another buffer that can be slower ?
The PCB and surrounding devices seem to be okay.
I've gone through the error messages and the whole bus seem to stuck on this value. There are Transceivers (74FCT245) near by, possibly one of them failed ?
Another problem is how to solder these chips. I have no experience using hot air what seems to be the only way to change them because the pins are inwards.

Quote:It rusted the plastic case?
Well, kinda ? The metal case underneath it rusted that bad off that the plastic got rust particles all over at the outside. Looks very ugly and is hard to clean.. But the plastic is also broken at some places, so..

Quote:Unless you have a Class 100 clean room, it's not advised to open the HDA. [...]
Yeah, it's a Quantum 4.3 GByte drive. I will try this, great tip.

Edit: Tried, it helped a bit. But now it seems like the drive tries to read the same spot over and over again and the SCSI Bus stucks.
(This post was last modified: 06-13-2024, 05:08 PM by CommLan.)
CommLan
O2

Trade Count: (0)
Posts: 3
Threads: 1
Joined: Jun 2024
Location: Germany
Find Reply
06-13-2024, 04:48 PM
#4
RE: Indigo 2 Extreme Graphics SHRAM error
The brand name of the chips isn't important, but their structure. The MCM6706AJ10 is a 32K x 8 SRAM, meaning it accepts a word address from 0-32767, and can read or write an 8-bit word. The chips next to them, TC5588J-15, are 8K x 8 SRAMs used for another purpose (storing the HQ2 microcode). This much is evident from the "Indy GR4 Schematic" which was leaked by an SGI employee. We don't have schematics for the Extreme but they use the same basic architecture.

Rework of these parts is no picnic, especially on a very thick multilayer board. The best approaches require preheating the board, and using specific tunnel SOIC tips. But more important is identifying the component failure before reworking it. Like I said, a failure to read back data from a RAM may be caused by a failure of the IC to get the required signals, in other words a circuit board problem. Cold solder joints are a frequent cause. The power bypass capacitors are also a frequent cause.

Personaliris O2 Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT Indigo2 Indy   (past: 4D70GT)
robespierre
refector peritus

Trade Count: (0)
Posts: 640
Threads: 3
Joined: Nov 2020
Location: Massholium
Find Reply
06-13-2024, 06:46 PM
#5
RE: Indigo 2 Extreme Graphics SHRAM error
This is really good information, thank you robespierre, for providing such actionable information. CommLan, what you're being told is that this actually is sort of a multifaceted issue. Yes you're experiencing a circuit problem. But this could be as simple as a cracked solder joint, as temperamental as poor power, or as bad as a failed SRAM chip.

What's important to determine is how do you go about, step-by-step, investigating this without doing any damage but also without going overboard and doing the most complex stuff first. Robespierre is right about using the SOIC tip (or sometimes a large paddle style desoldering tweezers) for removal in this sort of application. I have to use an SOIC tunnel tip in doing environmental monitoring disease repair for Fuel boards, I make no secret of this. However for the average person you have to have a fairly high-end, expensive, soldering station in order to accept these tips. You can't just take a simple 40W AC soldering iron from the hardware store and put a tip on them. You have to have a station that takes these types of tips and so normally you're talking a somewhat expensive investment for the average person that doesn't have an interest in doing more soldering in the future. Very likely the cost of the station and the tip would be more than you're paying to replace the entire graphics board you're having trouble with.

What equipment do you have, and what are you comfortable doing at this point? Indigo2 extreme graphics is normally not hard to get as a replacement. So the worst possible outcome is you'd be re-purchasing that specific board out of the stack on your extreme graphics set or the entire modular graphics card. I can't quickly find any good information on these chips but I seem to be seeing a pattern that many of these are an SOJ package. So the good news is style of packaging should be very easy even for a novice to quickly touch up each leg with a soldering iron with relative ease. I would personally attempt touching up the legs of the chips assuming that there's a crack solder joint as that's the easiest of all three tasks.

If touching up the solder on each leg does not fix your problem then it's likely that you're experiencing a failure in one of the chips. In that case the problem becomes that it's not clear which chip you need to change. I'd honestly say that while there are a few tests you can do with a multimeter to rule out of a few possible things, I don't have a way of using a multimeter alone to perform exhaustive testing to know which chip you should be focusing on. You could definitely use a multimeter to help see if it's a capacitor issue and to check the external ESD diode protection on the leg of each of the memory chips. That might yield something, and if it does that would point in the direction. However it might yield absolutely nothing leaving you back at square one as well.

I would factor your time and ability over cost in this particular repair. If you have a good enough multimeter I would go ahead and set it on diode mode and hook the COM lead to your grounding plane and test each of the legs on the SRAM chips looking for any readings that are very close to 0 V forward bias that don't seem to consistent with the readings of its neighboring chips. That is to say that all the chips in a bank should probably measure similarly as a group for each position in at each leg. It also may be prudent to isolate each chip by putting the COM probe on their chips select/enable leg or something similar to isolate each chip within the bus for performing probing at every other leg with the positive probe.

Since I've actually just described two tests above I would do the common ground test first and then as you move onto a whole other bank of memory around the card if you see everything suddenly change to higher values then that was a key that the previous bank may have been the one with the problem and then to go back and use the COM probe on the chip select/enable leg of each of the previous chips and verify the results that way to isolate which chip in the bank may be damaged. That's assuming external damage not pure memory failure, which again isn't detectable by this method.

This will only tell you if external damage came through one of the legs or if you have a crack solder joint, as long as you touch the soldering pad on the PCB and not the actual leg of the chip when you do your probing. Any anomalous readings from one chip out of a family chips is a great target to focus your attention on.

But the above recommendation is not all encompassing test. It will not exhaustively test each chip in the way it would need to be tested if you actually have real failed memory. Real memory that has failed will not show damage externally, it's actually one of the hardest concepts in repair that we have to deal with. However I'll also say that from my personal experience real memory failure after all these years is incredibly rare. It's more likely if there's a disconnect due to a crack solder joint then it is that after all this time that one of the chips has just failed. That doesn't mean that can't happen, but it means the odds are better but it's a disconnect versus a chip failure. It's also much more common for a few memory cells to fail. Not every memory sell on the chip. So if a chip is acting like it totally won't store anything that's normally more of a connection problem then odds are that every memory cell in the chip just failed. In the case that the chip has failed it's normally due to external influence, which means you can detect it on one of the legs, than a random random internal failure.

I would say that if you already own a good soldering station and you have enough experience and high confidence to at least touch up the legs, I would go ahead and do that and see what you get.

Outside of the above mentioned multimeter test, which is nondestructive, touching up each leg runs the risk of creating a solder bridge between legs which could be devastatingly bad. But if you're careful, don't rush, and recheck each of your touch up points visually as well as with a multimeter for a change in their readings this may be fixable without having to try to fully remove a chip or replace all of the SRAM chips on the board.

The good news is that this is obviously something at a level the card can recognize and therefore point you as to what type of entity you should be looking at. It's not just a general failure or an outright dead card. So as a research/learning project for you I would say it would be worth the experience with a multimeter and seeing if there's anything obvious and also possibly the experience of gently adding a little soldier and reflowing each of the legs on these chips. After that I would say if you don't have the proper equipment or inclination to do either very specialized testing with very specialized electronics troubleshooting equipment or whole hog replacing large sections of sram and testing over and over again until you get the error to disappear it's probably not cost-effective for you.

So try not to go far too far down a rabbit hole looking into this but I think doing the surface/easy stuff is definitely worth your time and interest. After that you may just want to hold onto it as trade or for somebody else to try to fix in the future and just get yourself another extreme graphics set as they really shouldn't cost you terribly much these days from people who are parting out indigo2 systems.

Definitely keep us informed on what you find and best of luck.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-14-2024, 01:34 AM
#6
RE: Indigo 2 Extreme Graphics SHRAM error
Studying the schematic points to an interesting wrinkle:
The SHRAM is "shared" between the HQ2 (host queue interface) and all of the GEs (geometry engines). The address to read or write comes from HQ2. But even though the SHRAM is 32 bits wide (4x 8 bit SRAMs), only 16 data bits are connected to HQ2. This means that when the host interface needs to write data into this memory (pixels, vertex coordinates, matrices, etc) it can only write into the low half of each word, leaving the upper half undefined. This must be extra "interesting" when it comes to testing the memory.

I point this out because the error shown above appears to exhibit itself in the high half of the data, the part that can only come circuitously from the GEs' side of the house. So what I initially thought pointed to a problem of the SRAMs may actually be pointing to a problem with the GEs or their sequencer. It is thornier to diagnose than would appear.

Personaliris O2 Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT Indigo2 Indy   (past: 4D70GT)
(This post was last modified: 06-14-2024, 07:15 AM by robespierre.)
robespierre
refector peritus

Trade Count: (0)
Posts: 640
Threads: 3
Joined: Nov 2020
Location: Massholium
Find Reply
06-14-2024, 02:30 AM
#7
RE: Indigo 2 Extreme Graphics SHRAM error
Hi to everyone,
thanks for this amount of reply to this topic. Soo, I made a decision and I think it's the most reasonable way.
I don't want to destroy this piece of history, so I sell it in hope that someone will get this working who is specialized.

I have special solder stations from Weller, a good oszilloscope and I think solder these chips out wouldn't be a problem, but get these chips back in is my problem. I see some problems with these underneath the chip placed contacts. Especially I don't know if this solve the problem. I don't think so. And I didn't find spare parts in my stock and haven't the time to investigate much.

So I pass this to someone who can actually find this fault and has the ability to repair this.

It's actually online near many other SGI parts on my Ebay account. So if someone is interested, do not hesitate. I hope that if the new owner of this assembly is interested in repair, he also keeps this post up to date. It's referenced in the article description. I put the whole sandwich online.

So thanks to everyone here for these tips.
CommLan
O2

Trade Count: (0)
Posts: 3
Threads: 1
Joined: Jun 2024
Location: Germany
Find Reply
06-18-2024, 09:45 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)