Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
#1
Bug  Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Hi All,
Long time no post...sorry, real life getting in the way...I'll try prevent that.  I'm posting today because of essentially a donation that was given to me for the cost of shipping by Geoman.  We both live in different countries but shipping proved to be very reasonable and Geoman was nice enough to contact me and offer this board.  I of course accepted because I just bought an Octane2, dual 400Mhz, system from Raion last year-ish, and this would totally rock in that.  Geoman offered it to me because the board doesn't work, he'd tried a few things himself but in the end it didn't solve the symptoms, so it ended up on a shelf for a few years and he decided to let it go and see if someone could fix it and give the board it's own 15 minutes of fame.

Challenge accepted!

The board is 030–1156–003 rev B with mounted daughtercard 030–1216–002 rev D

He shipped it to me and I received it about a week ago (from this posting) and even though I don't really have the Octane PSU rebuilt or setup now...I wanted to start the basic process of troubleshooting the board now to be fair to Geoman's offer and give me something to post.  Now understand that even if I solve the issue..we may not know for some time if it fully works because I have to drag my Octane2 out (hope it doesn't exploded from non-rebuilt PSU) and install the card, and see if the immediate symptom subsides...to check that it works...I'm going to be difficult and demand you guys wait unit I've rebuilt one Octane PSU so I feel safe reinstalling Irix and testing the system and then I can do video capture) to test it fully.  That won't happen immediately...but I CAN test the symptom in PROM.  So I'm willing to go that far without a parachute (rebuilt PSU).  Please understand...but I WILL eventually get there and we'll have the full answer.


Anyway Geoman described the symptoms and what he did so far.  I hope I'm getting this right. Geoman - If I leave anything out please feel free to correct me and I'll correct this post.

The described behavior was as follows:  He got the card years ago (I assume this problem happened from the moment he came into possession of it has he didn't indicate he had it working at one point) and after mounting it to the XIO carrier and physically inserting the carrier into his Octane system...it would fail to boot, power-on, run, show any life at all, dead.  When he removes the board...system starts fine, springs to life on power-on...no issues.  Put the board back in...Octane dead as doorstop.

Geoman said he new something had shorted so he changed caps "C69, C47, C48, C188, ceramic/smd capacitors, 1:1 type, original Kemet"  but that had no effect on the symptoms.  He also noted a (possibly purposeful mod by the factory) that the daughter card (named EVO Piggy) has pin 134 cut off on the flex cable.  It looked clean so it made be a factory/design thing.

I'd love it if someone with the same board can please CONFIRM that that trace for pin 134 is suppose to be cut (or it is on theirs).  I'll get more info when I can on that...but we'll hope for now it's factory, on purpose.

Also could someone please take good high-res pics of their board for my visual comparison...I'll post pics I have but I'd love to visually compare a few things to make sure.  So that would be extremely helpful.

   

I didn't bother much with the back of the PCB for now as I have enough on the front to give me my next steps.

Okay so this sounds like a classic main power rail shorted to ground...and luckily the Octane system is smart enough to figure this out without exploding!!!  So that's good too.

Going into DIODE test mode on my multimeter I randomly checked any caps I could tell have been replaced and whatever else I found.  I found caps C188, C105, C65, C196, C187, & C194 shorted (not an exhaustive test) so I stopped there.  Now what I mean is an in-place test...so all that means in something attaching main power and ground is shorted and those same caps are also on that path...doesn't mean the above listed caps are shorted...just yet.  There are other ways of testing which (if any) caps are shorted (later).

With such an overwhelming number of shorted areas spotted it was clear that the focus should be more towards the main power edge connector.

   

Main power has many shorted readings around it so maybe it's more local to the connector...but I've not done other tests so I'm not 100% on that of course...just the density of bad readings increased.  Also most neighboring pins on the edge connector for board power are shorted together with a small voltage drop as they are arranged in alternating power/ground groups.

Okay, so what have we learned here.  We know for 100% that one or more of the MAIN power rails that come directly into the card are shorted, I didn't see any explosive residue or evidence of a catastrophic failure so more than likely...it's a small component acting as a board heater...sucking power right off the main line.  We also know that the issues wasn't among the caps originally changed during the initial repair...but again that's a drop in the pond.

Okay this is the issue I had hoped for when I agreed to take the unit because it's an obvious issue, in your face and not subtle or intermittent.  So hopefully the problem is also that obvious. Now anything that's programmable and damaged will be disastrous...but my gut says...it's not going to be a special processor or programmable elements, it's either a cap or voltage regulator (LDO) on that rail that went.

There's a slow way and a fast way to go about this....I'm more about fast ways.  A slow way would be to use a leak tester I have along with a mV meter and try to follow the current flow to the component sucking the most current on a current-controlled power injection.  The fast way is to inject some power and use my thermal imager to try to locate possible culprit when it glows with thermal radiation....guess which way I'm going...yeah...the awesome way...like that had to be said.

So here we leave it for tonight, to be picked up tomorrow (hopefully).  We have a great video option board that's actually shorted, we'll find the short and see if it's eliminated.  After that I'll stick it in my Octane2 and see if it starts and PROM claims there is a a video option board installed!  Also I'm unsure I have all the mounting parts...so mounting MIGHT need some help.

So there you go everyone! Hopefully a quicky but the start of another repair adventure.
(This post was last modified: 04-06-2022, 09:50 AM by weblacky. Edit Reason: grammar & spelling )
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-06-2022, 04:07 AM
#2
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Okay...This isn't over but man was this a fighter.  I have some info...not a confirmed solution yet.  I need to reinstall some parts and remove some parts to VERIFY but I think I have (at least the first) shorted component.


I spent an hour on this...sadly it fought me because the power and ground planes are so big that they absorb energy like you wouldn't believe...and the component WAS HIDDEN UNDER THE DAUGHTERCARD!!!!  Annoyed 

I took the daughter off and the metal heatsink off a chip to verify there wasn't something hiding.  My big issue is I can't use a large voltage. I really didn't wanna go to 3 V so I pushed it to 2 V.  However I kept just getting localized heating up my alligator clips. I could not find anything that was heating up, originally.  After I remove the daughter card and heat sink I still didn't initially see anything so I ended up disconnecting a couple of the caps that were on the shorted rail.  I ended up pushing the settings to 2V at 8A.  Though just like my fiasco with my V10 graphics card I didn't wanna burn out the short so I never put it on for longer than about 10 seconds.   

What's really amazing about this is if you're staring at it with the imager, it lights up pretty fast but when you are juggling with one hand and appling power, then disconnect and pickup the imager to scan the board the cap actually cools down so quickly that everytime it was already cool enough to blend in with the board by the time I got the imager in my hand!!!  It could literally only be seen while having a thermal imager right in front of it as it wouldn't even last two seconds before it could cool down. Almost like a slow blinking LED (similar rate)


I was really getting toward my wits end because this should've been working and yet I wasn't getting results so I'm glad this finally popped up. I did not expect it to be one of these tiny tiny capacitors.   I couldn't believe one of these things could sink that kind of power. I mean we're talking 16 Watts here!

Because I ended up desoldering and then reattaching the LDO regulator that creates low voltage right at the connector and I desoldered two other capacitors I need to reattach those and then figure out the value for C52 then remove it then see if the short goes away.   Since this has a companion capacitor I assume I can simply measure the value off the companion.

I don't think I ever bought an assorted collection of ceramic caps. So I don't think I have anything to put in place right now. I will have to order something I think.  I might order a new regulator while I'm at it though, given the lifespan.

Anyway I'll be able to sleep now, so there's that...

C52...FLIR overlay:

   

iPhone Macro Pic of region:
   



More to come...
(This post was last modified: 04-09-2022, 06:14 AM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-07-2022, 06:26 AM
#3
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
I actually saw something like this at the VCFed Swap Meet..

Guy had a “defective power supply”, but turned out to be a short on his EVO tbat was installed.

Pulled the adapter and the machine fired right up.

Someone else scurried off with the defective board, not sure what happened.

Might we worth keeping proper notes, could help someone in the future.

Indigo2 IMPACT  : R10K-195MHz, 1GB RAM, 146GB 15K, CD-ROM, AudioDAT, MaxImpact w/ TRAM.  IRIX 6.5.22

O2 : R12K-400MHz, 1GB RAM, 300GB 15K, DVD-ROM, CRM Graphics, AV1/2 Media Boards & O2 Cam, DV-Link, FPA & SW1600.  IRIX 6.5.30

 : 2 x R14K-600MHz, 6GB RAM, V12 Graphics, PCI Shoebox.  IRIX 6.5.30

IBM  : 7012-39H, 7043-140

chulofiasco
Hardware Junkie

Trade Count: (0)
Posts: 328
Threads: 51
Joined: May 2019
Location: New York, NY
Website Find Reply
04-07-2022, 12:42 PM
#4
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Interesting that there’s precedent. I will say one thing is that this board has a small smattering of varied chemistries when it comes to caps. No outright standard tubular, wet, electrolytics but lots of MLCC, a few tants, and a couple solid aluminum SMD caps (I think). They do seem to be clustered around power regulators or conversion, where MLCC seems to be dominant (which is good).

Since this is a known stopping state for a peripheral-shorted Octane. I am interested in maybe coming up with a quick way of pulling an entire XIO carrier and doing a shorted board (main rail only) test with a multimeter. If the Xbow was more accessible it’s be nice to test all at once to check for a short. When I’m actually fixing my Octanes I’ll see if the PSU rail goes right to the power connectors for the XIO modules or if there’s something in between. If direct, perhaps just pulling PSU and going right to certain pins can do an “everything not shorted” test. Then if shorted pull the carrier and check again, then check boards on carrier.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-07-2022, 05:33 PM
#5
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Well as per my luck things have taken a turn. The capacitor registered as good and the short was still there when I removed it. It turns out the current sense resistor next to it was what was heating up. So I jumped the current sense resistor and another current sense resistor on the other side of the board in a similar location started heating up. I've since jumped that current resistor and I've yet to find anything.

My bag of tricks is having a real hard time with this. I've seen stuff like this before where the resistors act as current limiters and heat up but the current can't actually pass through them enough to heat up downstream components. If you can find them and put your power injection point pass them then the next thing in the line will finally register. So far, I seem to have lost the trail.

So I guess I need to change my approach. Right now I've been looking at the main power rail directly and trying to follow down stream. However if multiple things are shorted it could be like finding leaks all along the system of dams. Not easy.

Since this doesn't appear to be just a single item on the main power rail, I'm thinking that maybe I should work backwards using my curve tracer and actually start looking at the larger ICs on the board. While they certainly might be on a shorted power bus their other legs wouldn't necessarily be shorted if they're intect. If I happen to find a chip where it has multiple shorted legs to ground that the data sheet says really shouldn't happen then perhaps I can attack that and try to test larger chips first just to see if they happen to be shorted and therefore dragging down the whole rail.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-08-2022, 01:34 AM
#6
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Well okay, not looking better but..interesting.  I removed the two 000 resistors that lit up when I injected power near the XIO compression connector. Oddly, one 000 did nothing to change readings, however removing 000 on R29 has cut off the short from the back half of the grounding plane of the board.  By that I mean the test for a short from the metal port interface to the main rails on the other end of the card is no longer short.  However..LOCAL ground on about 60% of the board (towards the Compression connector) are still shorted to main voltage.  Also, If check using the pads of the R29 footprint, the local grounds after the 000 connection are NOT shorted to main rail as well.  So I've at least cut off a chunk of the circuit and confirmed in isolation that it isn't short...so that's something.

But that still leaves me with the majority of the PCB to investigate.  I was considering looking for more 000 links to disconnect...but haven't really found much else than the two I disconnected.  The other 000 disconnection made no difference in it's local area's ground to mains short.  Give the stability of this short...I'm wondering if it's not a PCB layer interaction of something in the board...though I don't see any separation, charring, or the like.

I did look up several of the large ICs, none of their voltage rails where shorted (not an exhaustive test), but the at least 40% of the big ICs are not shorted, nor on a shorted the voltage rail.


I'm going to take a new approach and take off all the metal shielding and screws...in-case a screw has somehow been making this connection!  The connection between ground and this voltage rail (likely it's NOT the 5v rail as 5V chips seem okay?) is so good and solid and takes 3v @ 8A with ease without warming up...I'm beginning to think it's a problem on the metal assembly or vias or something. I cannot reasonably go higher than 3V without risking additional damage.  I can go up to 10A and that's my bench DC PSU limit.  Otherwise I'd have to use a ATX PSU 3.3V rail to produce more power (uncontrolled, which I really don't want).

After I remove ALL the metal and cover the reflective labels on 3 ICs with black tape, I'll also stop getting a bunch of emission false-positives from my thermal imager, when it hits shiny surfaces.

So at least it will make thermal imaging easier for me.  If stripping the unit of it's metal assembly yields fruit...I'll follow that.
(This post was last modified: 04-08-2022, 09:13 AM by weblacky. Edit Reason: spelling & grammar. )
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-08-2022, 07:42 AM
#7
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Okay, solved the puzzle...it's beyond repair. This board cannot be repaired due to MAIN IC LSI SGI 1997 L2A0710 XIO Bridge Rev D chip 099-0125-007 being shorted!!!!  This was the large, gray, IC under the heatsink.  So...it's dead due to this custom chip (and maybe other issues) being shorted.

   

Okay, let me tell you the LONG adventure of getting there.  I have video but...I don't have a place to upload the video the proves it. 

After all the tracking down, I found another clue, there was solder on the underside of the black heatsink.  This didn't correspond to a specific component due to the fact that solder blob was semi-mobile and you could push it around with your finger...but was statically attached to the underside of the heatsink.  But that wasn't here when I first removed the heatsink!  Also the short WAS still there was I last removed the heatsink.

   

A small cap (C142) was partially desoldered, riding up at a 35 degree angle from the pad (like it had desoldered itself and fell against the underside of the heat sink, then solidified).  I surmised that this meant the energy I'd been pouring into the PCB was to blame...maybe my culprit.  I checked the board and the short was gone.  WOW, very excited but decided to call it a night.

Next day (today) to assumed I was right and started put back parts I had removed.  All was going well and I got everything back soldered and short was still gone!

Proceeded to attach the black heatsink, test it again with multimeter..short is BACK!!!  WHAT!!!  Okay, I played with the heatsink, covered the underside in polyimide tape in case it was hitting stuff (It was...but whatever...didn't help).

   

Finally it after short appeared, then disappeared with the heatsink install & removal, it stuck on shorted...it was still there after heatsink removal...well beans.  So I went back to square one and tried again...no luck with thermal imager, no luck with current tracers...it's like all the progress of the past 2 days were erased from history.

Feeling defeated I decided try to PCB bending again because that's the only think the heatsink could be doing...bending.  I eventually got things in a state where, with my multimeter leads clamped to the correct rails, I could press the the center of the Bridge chip with my thumb and the multimeter would suddenly drop to a short, then lift up...short was gone.

So flexing of the PCB has cracked a solder joint...but when I press it down (or clamp the heatsink to it) it reconnects...and becomes/reveals shorted.

Ergo the main XIO bridge chip under the heatsink was the damaged part all along, it could take and take power without heating up because of its thermal mass and the fact that I was limited to 20W...which means nothing to it and the grounding plane.

Short was so big that other connected parts heated up before it did.

So, that's the answer.  I'll keep the board in case one day I get a damaged Octane peripheral with a working XIO bridge and maybe try to reball and solder it on...but considering how cheap Octane EVO boards are...it's a project that isn't financially worth it.

So there wasn't a fix for this guys...I'm a little disappointed too...but I guess now we know how the Octane knew to shut down.  Bridge IC for the XIO connection was shorted to ground.
(This post was last modified: 04-09-2022, 06:51 AM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-09-2022, 06:51 AM
#8
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
I'm sorry you did not succeed.

But your research and documentation was very insightful nonetheless.

SGI - the legend will never die!!

Indy Indigo Crimson Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT O2 O2 Octane Octane2 Octane2 Tezro
Geoman
Crimson to Tezro

Trade Count: (0)
Posts: 162
Threads: 13
Joined: May 2018
Location: Germany
Find Reply
04-10-2022, 11:07 AM
#9
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
Yeah, but that’s okay. I knew the risks of it all so no hard feelings, after all there was an equal chance it was fixable.

I agree that the shorted Octane symptom was in itself valuable. Because that may mean that the bridge chip is shorted (which in turn shorts the entire Octane bus) when that’s the symptom.

Now I’m still wondering how that even happens. I mean I guess if someone has a failing power supply in their Octane it can do just about anything but from a design concern I’m wondering if this is something we need to be aware of and actually advise against. By that I mean do not start running high end peripherals on an Octane that doesn’t have a rebuilt power supply because this could happen. One could argue that loss of some of these peripherals is worse than loss of the Octane itself. Getting high end video cards and video processing equipment is actually harder than getting a complete working basic Octane.

If I ever get good BGA rework equipment in the future I’ll definitely keep the board in case I ever run across a damaged Octane Peripherals where this chip works just fine and I actually have equipment that can re-ball and transfer a chip this large successfully.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
04-10-2022, 10:55 PM
#10
RE: Trying to repair an Octane EVO Personal Video Option Board - 030–1 156–003 Rev B
(04-09-2022, 06:51 AM)weblacky Wrote:  Ergo the main XIO bridge chip under the heatsink was the damaged part all along

Not the first one. I've seen one of these go with a bang -- literally. I think it left a black mark on the PCB where the flame was. It was an XIO option but I forgot which one -- video, FC, ...
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
04-11-2022, 09:23 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)