Are the SGis that have survived this long the most reliable ones?
#1
Are the SGis that have survived this long the most reliable ones?
Yo All,
Odd question for our old-timers but I have to ask, as this is very much news to me.

I'm starting to re-read old USENET postings hoping to "relearn'" forgotten SGI info and I came across this post (which isn't the ONLY one mentioning these kinds of issue!!!):

https://groups.google.com/g/comp.sys.sgi...nFFNOxL5YJ

This took me aback, I'd hear rumors of like SGI Fuels going bang when first introduced...but never something like this.  Is it true that (back in the day) SGI shipments went through bouts of "manufacturing issues" and bad revisions to well-known stations?

Does this mean that those that failed back then were poorly made and what's left today is all the "good ones" weeded out by time?

I don't have more links but the archive for comp.sys.sgi.* also tells stories of phantom Indy reboots during idle, O2 mainboards just "going bad", etc...

Can anyone who "was there" shed light on these events?  Are these postings just outliers...or was something actually going on back then?  I thought (with a few exceptions) that SGIs were supposed to be of higher quality than an average PC, back in the day. These stories sort of show the company as...well...something else.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
02-15-2022, 08:17 AM
#2
RE: Are the SGis that have survived this long the most reliable ones?
(02-15-2022, 08:17 AM)weblacky Wrote:  stories of phantom Indy reboots during idle

That doesn't have to be a hardware problem. Indy originally shipped with IRIX 5.1 which suffered from many bugs and memory leaks. The Indy initially also shipped with as little as 16MB RAM. They had to be rebooted regularly or they would run out of memory just sitting there.

In my personal opinion the Zytec (?) PSU in the teal indigo2 was more reliable than the IMPACT Indigo2 PSU but I probably didn't see enough of them to make this a statistically valid statement.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
02-15-2022, 08:52 AM
#3
RE: Are the SGis that have survived this long the most reliable ones?
(02-15-2022, 08:52 AM)jan-jaap Wrote:  
(02-15-2022, 08:17 AM)weblacky Wrote:  stories of phantom Indy reboots during idle

That doesn't have to be a hardware problem. Indy originally shipped with IRIX 5.1 which suffered from many bugs and memory leaks. The Indy initially also shipped with as little as 16MB RAM. They had to be rebooted regularly or they would run out of memory just sitting there.

In my personal opinion the Zytec (?) PSU in the teal indigo2 was more reliable than the IMPACT Indigo2 PSU but I probably didn't see enough of them to make this a statistically valid statement.

Under load, especially when GFX-intensive programs are running, (Mekton!), my IMPACT Indigo2 PSU makes a buzzing noise. Is that normal, or might this be a sign of imminent failure?

SGI - the legend will never die!!

Indy Indigo Crimson Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT O2 O2 Octane Octane2 Octane2 Tezro
Geoman
Crimson to Tezro

Trade Count: (0)
Posts: 162
Threads: 13
Joined: May 2018
Location: Germany
Find Reply
02-15-2022, 10:59 AM
#4
RE: Are the SGis that have survived this long the most reliable ones?
That could just be coil whine.
Irinikus
Hardware Connoisseur

Trade Count: (0)
Posts: 3,475
Threads: 319
Joined: Dec 2017
Location: South Africa
Website Find Reply
02-15-2022, 01:25 PM
#5
RE: Are the SGis that have survived this long the most reliable ones?
(02-15-2022, 01:25 PM)Irinikus Wrote:  That could just be coil whine.

I very much hope so :-)

SGI - the legend will never die!!

Indy Indigo Crimson Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT O2 O2 Octane Octane2 Octane2 Tezro
Geoman
Crimson to Tezro

Trade Count: (0)
Posts: 162
Threads: 13
Joined: May 2018
Location: Germany
Find Reply
02-15-2022, 10:53 PM
#6
RE: Are the SGis that have survived this long the most reliable ones?
Beware the survivorship bias. Given enough time the number of working SGIs will probably decrease to zero.

Octane2  R14k 600MHz, V10, 2GB RAM, 73GB disk, IRIX 6.5.22
shrek
It's not done until it's ogre.

Trade Count: (0)
Posts: 233
Threads: 19
Joined: Jan 2019
Location: United States
Find Reply
02-16-2022, 01:14 AM
#7
RE: Are the SGis that have survived this long the most reliable ones?
(02-16-2022, 01:14 AM)shrek Wrote:  Beware the survivorship bias. Given enough time the number of working SGIs will probably decrease to zero.

Oh, no doubt...age is creeping up.  However I've been reading quite a few things in these old posts around 2001-2005 on usenet about failures really soon (like 1-4 years) after receiving a brand new systems!  About university-sized installations needing four power supplies a month for like just 70 machines (I think I2).  Also I ran across at least two posts claiming SGI kept shipping them used or refurbished systems with incorrect specs that were DOA (IMPACT I2 versus XZ) during warranty claims.

Also I read a few that claimed I2 mainboard "failures" out of the blue for young systems.  Granted, these were still under warranty so swap-outs were done and nothing was said about causes.  But PSU and MB failure seemed to rule the roost in these kinds of posts (almost never a graphics failure).

It just got me thinking that maybe there was much more to SGI's day to day and even downfall then I understood.  I've always been in the used market...I've only ONCE seen an SGI (in its natural environment) at an aerospace office back in 1996.  Otherwise, my exposure was purely used equipment.  So have no experience on the Admin or maintenance "frontlines".

Given that I'm now interested in repair...this information is actually important to know...it's been lost so I was hoping someone could shed more light on these older complaints.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
02-16-2022, 04:23 AM
#8
RE: Are the SGis that have survived this long the most reliable ones?
Semiconductor engineers model device life as a "bathtub curve". At the beginning of life there is a higher than average failure rate, sometimes called "infant mortality", from devices that were marginal to begin with. Then the failure rate drops, once the marginal devices have been weeded out. At the other end, the end of life, failure rates rise again. These are components that wore out, or aged in a way that exposed or activated underlying fabrication defects. Before a system makes it to the customer, it gets "burned in" by being run in a test loop for some time, to catch those marginal components. Burn-in is not simply a test: it is intended to exercise every device in order to trigger weak ones to self-destruct, before they get shipped out the door and cause more expensive warranty costs. But not every marginal, short-lived device is found, because no company can afford to burn in products for months. So there are units that fail relatively quickly, from infant mortality, while customers are using them.

This has two impacts for the manufacturer: losses from warranty service, and reputational losses if product failures are publicized. By performing failure analysis, they try to find improvements that reduce failure rates. In some cases, all of the early revision products in the field might be changed for later revisions with fixes in place (whether called a "recall" or not), if the savings from better reliability outweigh that expense. As I recall, Sun did this a number of times with the 386i and other workstations.
For the collector, it means that "new in box, unused" pieces may also be defective. An untested machine is still a machine not known to be operational. Is knowing details about early failures relevant to repairing vintage systems? To an extent: the methods used to localize faults including schematics and test vectors would still be useful, even if the failures are not the same. It's unlikely that a flowchart for early failures would still be applicable.

To the topic question, I don't think so. It has a hidden assumption that the primary reason machines got trashed is for not working, but I don't think that's the case. From what I've read, the most common reason for junking a unix workstation is policy, unrelated to reliability. With the big iron, it frequently happens that the scrap value of the metals exceeds its market value as a computer. That is also unrelated to reliability.

Personaliris O2 Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT Indigo2 Indy   (past: 4D70GT)
(This post was last modified: 02-16-2022, 05:29 AM by robespierre.)
robespierre
refector peritus

Trade Count: (0)
Posts: 640
Threads: 3
Joined: Nov 2020
Location: Massholium
Find Reply
02-16-2022, 05:29 AM
#9
RE: Are the SGis that have survived this long the most reliable ones?
To be clear I was not assuming the reasons SGIs were thrown away is because they failed (the posts I'm talking about are back when people still had contracts).  I'm simply asking about these experiences of relatively new systems failing in this manner (Mainboards dead after shutdown and bad PSUs) during early life or production.  Some of these posts from around 2000-2002 talk about headaches with constant issues...others say they don't know what you're talking about.

I'd not heard of the bathtub curve, so I appreciate the insight that there is an expectation on both the front and back end of the production lifecycle.  

I am giving this weight because (outside of a design change, which we have zero info on...other than there were design updates, given part numbers) all revisions of mainboard and PSUs are still around.  A "poorly designed" revision was not "culled" from the market if it never failed or the system was retired before then (I doubt SGI "recalled" anything).

I'm thinking about several different ends of these revelations:

1. Survivors are survivors because they are NOT flawed production units or have been run long enough to "flesh out" if they were faulty from the factory or not (by now).

2. Machines that weren't run much and then put away in storage (there is no evidence of significant usage/runtime) and therefore are ripe for failure without any kind of refresh or checks due to knowledge of these known events?

3. Neither 1 or 2 can be excluded (they are more complimentary ideas) so do we view ALL earlier mainboard revisions and parts (given the part numbers) as needing shotgun part changes (passives checked (i.e. resistors & caps) and ALL non-MLCC caps replaced) before first time power-on due to the fact that it's an earlier revision and for no other reason?


I did run across posts about the Indigo2 Audio board cap failures as well (as some of you said earlier in that other post) so there was clear evidence it was happening not long after units were put into service. Again this info creates an interesting idea.  I've talked about artificial refresh before (when I first joined IrixNet).  We have zero data on failure specifics of early events (just swap the board under SGI contract and go).  No info on WHAT failed (simple cap or real important IC)?

Basically I guess I'm internally conflicted for further research on repair if these were more "design issue" or they were faulty parts that didn't meet spec by their supplier (like a cap plague sort of idea)?
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
02-16-2022, 10:22 PM
#10
RE: Are the SGis that have survived this long the most reliable ones?
(02-16-2022, 10:22 PM)weblacky Wrote:  faulty parts that didn't meet spec by their supplier (like a cap plague sort of idea)?

This happened around the turn of the millennium. It was a story of stolen chemical formulas of elcos and counterfeit Chinese manufacturing. Certain brands of PC motherboards were failing all over the place. Then again, leaky caps happened well before that as well.
jan-jaap
SGI Collector

Trade Count: (0)
Posts: 1,048
Threads: 37
Joined: Jun 2018
Location: Netherlands
Website Find Reply
02-17-2022, 08:50 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)