O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
#1
O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi SGI'ers,

I de-racked my O350 machines and swapped out all the Snaphat Oscillator/Battery packs over the weekend.

Of the 5 Snaphat 3 where dead and 2 had very low voltage so they have all expired in sync.

Having gone to the trouble of pulling out the IO9 boards to swap the batteries (see my blog for pics), I did some shifting of PCI board so all the PCIX boards (Neterion 10GbE, LSI SAS3442X-R SAS/SATA & LSI 4Gbit/sec Fibre Channel) were in the top two PCI slots only to get PCIX benefits).

As a result I pulled an Adapter Firewire 4300 (DM10) out of one machine and put into into another with free slots.

On power up the O350 with additional DM10 board is now reporting a low voltage failure:

>> MXXXXXXX6-001-L2>power up
>> 001c04
>> 001c04 ATTN: 1.8V low fault limit reached @  1.396V.
>> 001c04
>> 001c04 ATTN: brick auto power down in 30 seconds
>> 001c04
>> 001c04 ATTN: brick auto power down in 25 seconds
>> MXXXXXXX6-001-L2>env001c04
>> 001c04 ATTN: brick auto power down in 20 seconds


A check of the environment (via L2) shows the problem on the 1.8v line:

>> MXXXXXXX6-001-L2>env
>> 001c04 ATTN: brick auto power down in 20 seconds
>>
>> 001c04:
>> Environmental monitoring is enabled and running.
>>
>> Description    State      Warning Limits    Fault Limits      Current
>> -------------- ----------  -----------------  -----------------  -------
>>          1.8V      Fault  10%  1.62/  1.98  20%  1.44/  2.16    1.382.      <<===== This one
>>            12V    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.125
>>        12V #2    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.063
>>          3.3V    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.337
>>        12V IO    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.125
>>        5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.070
>>      3.3V AUX    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.302
>>    PCI 5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.096
>>      PCI 3.3V    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.337
>>      PCI 2.5V    Enabled  10%  2.25/  2.75  20%  2.00/  3.00    2.509
>>        PCI 5V    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    4.966
>>  XIO 12V BIAS <not present>
>>        XIO 5V <not present>
>>      XIO 2.5V <not present>
>>  XIO 3.3V AUX <not present>
>>  IP59 3.3V AUX    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.302
>>    IP59 5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.070
>>      IP59 12V    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.063
>>      IP59 VCPU    Enabled  10%  1.14/  1.40  20%  1.02/  1.52    1.297
>>      IP59 SRAM    Enabled  10%  2.25/  2.75  20%  2.00/  3.00    2.483
>>      IP59 1.5V    Enabled  10%  1.35/  1.65  20%  1.20/  1.80    1.495
>>

As this machine has dual Power Supplies I am wondering if the problem is with the Voltage regulator module (VRM) on IP59_4CPU processor board.

Has anyone had any experience with faulty VRM ?

Does this require replacement of VRM or can these be fixed by replacing capacitors or such ?

Thank you for any tips.

Cheers from Oz,


John.
(This post was last modified: 11-16-2020, 05:27 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
11-16-2020, 05:18 AM
#2
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Yo and welcome to the club!

Firstly, I also had the same firewire adapter in my Tezro, it caused a slowly dropping 5v rail panic on my machine (it was the one on nekochan...that was me, as no one had seen this before) until I removed it from the Tezro (took 10-15 minutes to happen). I still believe that was PSU related...unsure, never came back after the removal of that waste-of-time adaptec card. But the same machine (2-3 months, in 2011, later started doing exactly what you describe). It's been on the shelf ever since waiting for me to train myself to fix it.

I not only have this exact issue on one of my Tezros, detailed over on the "other SGI forum website", but so do other users (so does Elf's Tezro in his PSU post!). My Tezro, that has this issue, is just a dual 800Mhz...not a huge power hog.

It's a VRM on the MAINBOARD, not nodeboard. I finally have some equipment to look into this but it will be awhile. The VRM for 1.8v FALLING is like clockwork, over like 45 seconds, for me (regular). Some people claim there's it's ON/OFF...or intermittent. Sometimes it doesn't happen. I believe the variance in the symptom is HEATING related...but they will eventually become like mine, happening all the time.

The LDO voltage regulator on this section is exactly the same chip used for VRM(s) power on the entire Origin architecture series!!! Which is why my Tezro and your O350 share this.

So here's the rub, I have a theory, which isn't the same as ELf's theory but...they are all still theories, no one has all the equipment + time to tackle this yet. I fear I'll need to get a top-notch SMD desoldering tweezer system to tackle this.

My theory is, even though these are very advanced LDO voltage regulators (they do contain something like zero sum resistance error correction in their feedback) that they are actually experiencing was it referred to as feedback oscillation, directly due to a change in the resistance of the components along it's feedback area. You see, no matter how you design these things, they are designed for a "window" of resistance (high/low - min/max) for the feedback circuit. These regulators have the documented feature of optimizing and adjusting themselves to accept a wide range of feedback resistance then most, at that time. The documentation calls this the Opti-loop (I think).

So the feedback circuits does several things at once, It tries to regulate the actual DC/DC buck process AND optimize itself for changes in resistance along the way...this ALSO includes ESR (it's specifically called out in the datasheet). So as the SVP 470 SMD cap (which I think is its C_OUT cap, but I"ve not meter-hunted to make sure) changes in ESR (these were VERY LOW ESR back in the day, they are still made actually) the regulator tries to optimize...but it can only do so much. With the ESR likely changing as heating/cooling is happening in the circuit (use on aged cap), the regulator is now optimizing AND regulating at once to handle this very fast.

It's my belief that these caps have changed so much, that BOTH regulations are in constantly change...which instead of minimizing oscillations...increases them. Regulator oscillation is a known issue (big one). It causes the regulator to undershoot or overshoot voltage coming out (because the entire feedback circuit is now out of design tolerance
!). I think if ESR increases it UNDERSHOOTS voltage (I think, no proof yet).

That is my theory. Elf has claimed to find the exact controller region (see the TEZRO PSU Thread). He claims he wants to replace the entire semiconductor region in a shotgun approach. (I think that's dangerous...if you mess up, and likely unneeded).

So since this exact layout is used in other boards, I've recently dropped some coin on a similar (not the same) Origin 3000-type mainboard that claimed to be running and has this exact VRM layout (last week actually).

Because the layout appears identical, I was going to use my new Huntron 3200S tracker (still setting that up) to perform a comparison of the IV signatures at that exact point between my 2-3 boards. I'm betting my money and time, I'll find increased resistance and lower capacitan, in the graph on my board, but otherwise (overly) identical graphs between both boards....if that is the case...than I'm right. A large change (due to age) in resistance has caused the voltage regulators to be unable to properly keep the output voltage within the 5% they state, because their feedback circuits are no longer at the DESIGNED resistance (that's what's changed), and doing additional reading..I'm told that's super important. It's not a wire, it's a bunch of resistance calculations at various stages...if those are off...the whole thing swings. Also the usual increase in C_OUT cap ESR will also cause UNDERSHOOT of voltage on a normal buck converter anyway. Both signs point to lower (not higher) voltage as cap ESR increases and cap capacitance decreases, while ripple output increases and the feedback circuit is trying to calm it all.

Elf believes both the caps (shows a thermal on the thread) and a MOSFET are to blame. He believes the MOSFET is "sticking", but when I questioned what he meant by that, be couldn't really explain his theory...he claimed he meant the controller is sticking the MOSFET. I think it's operating exactly as the controller is telling it, the controller just has the wrong feedback info now.

The good news, those high-end low ESR panasonic caps are still made and can be gotten (pricey-though..like ~$2 a cap). I briefly looking into using a new MLCC 740uF cap...but if you factor in the DC Bias (didn't know about this before...real bummer with MLCC) and the needed 470uF tolerance of the original cap...it's a NO GO. We need to use an electrolytic...ceramic won't work unless we can place a lot of them in parallel to counteract the 50% loss the capacitance at the needed ~2V (still sucks).

The bad news, outside of the ludicrous push and twist method of SMD cap removal...these boards have a lot of layers and I WILL NEVER use hot air removal (board warmer...yes). Hot air to remove these caps will delaminate and cook the board surface (blacken/damage it) and get you nowhere. So to remove and properly install these same caps..I need good SMD tweezers...I don't have $800 lying around for one, right now.


So while I know have the equipment to troubleshoot the issue, I don't have the best equipment to recap the board's SMDs and fix the issue (for me).

I'll hopefully know in like 2-3 months on the test front (just comparison), I doubt the signatures will be the same, there will be a difference and my equipment can directly tell if that change is capacitance, resistance, inductance, or semi-conductance (all at once).

So I'm on this, however...I doubt I'll try to run ANY of my Tezros without a PSU recap (because they did start last time I ran them many years ago). I won't risk multiple failures. So first I need to analysys and theorize (nearly done...I hope), then rebuild the PSU, then measure (record) recap the VRM sections of the MAINBOARD, then I can try it. I also will need to recap the Nodeboard as well (same caps) but I think they are used as decoupling caps and no regulation output. So they likely will take a backseat.

While I understand from the scientific method that I'd be changing too many variables at once to be conclusive (PSU or VRM). I won't start taking any system out for a run with an old PSU right now. So both need to happen at the same time.

Short answer to your question is, it's known...considered deadly right now...don't push your system, turn it off. Wait, likely it's a recapping issue...but it's the local regulator being unable to regulate anymore. The reason for it is conjecture, perhaps I'm wrong and Elf is right, an entire replacement of the area needs to happen. With my tools, as long as I have one that hasn't failed yet, the comparison will show if that's true or not. The physical fix...those are different tools.

FYI, if you look at the mainboard and node board...there aren't a huge amount of these, so that's good. But given the labor involved a recaped tezro should be like $5,000-6,000 just from labor and assembly alone. So as I mentioned before there WILL come a time when price divergence will be huge between vintage-found systems and working rebuilt systems.

All this takes time...welcome aboard my life boat, I hope you brought food and supplies.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-16-2020, 08:05 AM
#3
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi Weblacky,

thank you for the comprehensive feedback.

I will first take the Adaptec 4300 (DM10) out of the machine and see where the voltage levels sit after that.

If the underlying problem is with electrolytic capacitors, then there is a world of experience in the Vintage Macintosh community to draw on.

I have an Apple Macintosh SE/30 which has a recapped system board and the old Mac's also have issues with leaky capacitors in the power supply and for the old portables in the LCD displays.

With the old Mac's there is visual signs of leakage around the capacitor.

When I look inside my SGI machines and see lots of surface mount electrolytic capacitors I start to worry.

With the old Mac's the "metal fatigue" removal technique ;-) has been proved to be the best and cheapest way to take off the old capacitors.

So while not elegant, it does work.

I will keep you posted on any additional findings or thoughts I have on this, but the details for power supply electronics are beyond me.

My first thought is we need to create a map (schematic) or take a picture of the O350/Tezro interface board so it is clear which parts are doing what (ie voltage regulation, disk interface, l1 etc)

Cheers from Oz,


John
(This post was last modified: 11-16-2020, 10:35 PM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
11-16-2020, 10:31 PM
#4
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi John,
The Panasonic SVP SMD caps in these boards are a special LOW-ESR formulation (still made, that's what I plan to use), so the ESR will rise LONG before they leak (if they ever even do). According to the IC data sheet for the LDO regulator, this low ESR was required to get the DC conversion so accurate that it would only wiggle like 1-2 tenths off the expect voltage (1.8v). Also, these cap values are pretty large, so large they cannot reasonably be taken up my MLCC caps, nor can Tantalum be fully expected to work here (they might, but I don't have enough training to weigh that decision). I'm curious about poly-tants...but cannot find enough about them to prove they will work (yet).

As you know, SGI made a great decision to go heavy tantalum back in the day - great choice, compared to all electrolytic! SGI boards have TONs of cap...but not that many electrolytic caps...because they knew...can't trust them but in same cases, you have to use them. Power conversion cases still HAVE to use them for most scenarios due to the wide ripple they can absorb and the fact that their capacitance doesn't really vary with voltage, unlike ceramics.

Also, the mainboard has very few caps, the Node Board has a lot more, we're talking about the mainboard here, for this region.

I will never use the twist method because the risk of pad damage is very real, no one cares on a $50 Apple board, they do care on an $800 SGI board. If I saw pad repair (that I could readily identify) on an SGI board...I'd offer you $10 for it. I couldn't trust it, it either looks factory or it's been messed up by previous owner. I shouldn't be able to tell (readily) if I've done my job right, other than maybe shiny solder regions.

I can tell you that we will never, never, never have trusted schematics. Even if some genius (who's not me) comes along and can just performs a drawing with just eyesight...I'd never trust it. And for power systems, it's totally unneeded, DC/DC power conversion is HIGHLY localized (unlike logic) and very discrete, I can easily point to them on any mainboard with only 1 minute of viewing the board. They are simple, easily findable, and not plentiful. Even on these boards with like 15 voltages or whatever, not a big deal. A logic issue...different ball game. So don't stress over power stuff, power stuff is easy compared to everything else on that board. The real basics of power conversion haven't changed in decades (there are some new switching/transformerless techniques...but you won't see them in an SGI).


Right now this gets thrown under the PSU pile. We need clean power to have clean conversion (simple, no?)

Also, not to be flippant (it's going to sound that way). But considering the size, scale, speed, frequencies, components, and timeline of SGIs...they don't have much in common with vintage 80's or 90's consumer computing. And old apple products have stolen schematics now, so...not the same ball game. SGI=Ferrari (fast, expensive, large, no parts, no docs, no company help). SGIs are a vintage classification all their own. Really obscure, huge, stuff in a computer museum (Cray, DEC VAX, IBM Mainframe) has its own scale of challenges. SGIs sit between those and computer workstations. Replacing 20 caps on a macintosh board (worth ~$50 a board) isn't the same as ~$800+ board on SGIs, mess one of those up...now find another one. This is why I repeatedly said that I don't want to encourage radio-shack soldering iron hobbyists to attempt repair on these (it will be worse than a shade-tree mechanic). I don't think SGI sold MILLIONS of each station like Apple, did they even break 100,000 units on each station?. So the supercar reference is more apt, limited supply, condition is everything, building your own parts may become necessary, and don't wreck one.

The problems of age a nearly the same, but the scale of the operation, tolerance of the equipment, and limit supply make SGI repair a new and exciting thing...that should require a lot of prep and not be done outside of that prep and testing. But I do predict that we'll see a bunch of messed-up SGI boards in the future due to poor repairs and that will create a 3rd class of equipment pricing.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-17-2020, 03:52 AM
#5
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi Weblacky,

nice to hear I have a Ferrari, but the problem is I have Fiat budget ;-)

Jokes aside I pulled the Adaptec 4300 (DM10) out and once more env reports ok:

>> M200XXXX-001-L2>l1 env
>>
>>> ...
>>  
>> 001c04:
>> Environmental monitoring is enabled and running.
>> 
>> Description    State      Warning Limits    Fault Limits      Current
>> -------------- ----------  -----------------  -----------------  -------
>>           1.8V    Enabled  10%  1.62/  1.98  20%  1.44/  2.16    1.777.  <<===== This one
>>           12V    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.125
>>         12V #2    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.063
>>           3.3V    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.337
>>         12V IO    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.063
>>         5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.070
>>       3.3V AUX    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.302
>>     PCI 5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.096
>>       PCI 3.3V    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.320
>>       PCI 2.5V    Enabled  10%  2.25/  2.75  20%  2.00/  3.00    2.509
>>         PCI 5V    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    4.966
>>   XIO 12V BIAS <not present>
>>         XIO 5V <not present>
>>       XIO 2.5V <not present>
>>   XIO 3.3V AUX <not present>
>> IP59 3.3V AUX    Enabled  10%  2.97/  3.63  20%  2.64/  3.96    3.302
>>   IP59 5V AUX    Enabled  10%  4.50/  5.50  20%  4.00/  6.00    5.070
>>       IP59 12V    Enabled  10%  10.80/ 13.20  20%  9.60/ 14.40  12.000
>>     IP59 VCPU    Enabled  10%  1.14/  1.40  20%  1.02/  1.52    1.283
>>     IP59 SRAM    Enabled  10%  2.25/  2.75  20%  2.00/  3.00    2.470
>>     IP59 1.5V    Enabled  10%  1.35/  1.65  20%  1.20/  1.80    1.495
>> 
>> Description    State      Warning RPM  Current RPM
>> --------------- ----------  -----------  -----------
>> FAN  0  EXHST 1    Enabled        1980        2700
>> FAN  1  EXHST 2    Enabled        1980        2721
>> FAN  2      PS    Enabled        3200        4821
>> FAN  3    PCI 1    Enabled        1980        2616
>> FAN  4    PCI 2    Enabled        1980        2934
>> FAN  5  N0 LEFT    Enabled        1980        4000
>> FAN  6  N0 CNTR    Enabled        1980        3797
>> FAN  7 N0 RIGHT    Enabled        1980        4054
>> 
>>                               Advisory  Critical  Fault      Current     
>> Description      State      Temp      Temp      Temp      Temp     
>> ----------------- ----------  ---------  ---------  ---------  --------- 
>> 0 INTERFACE 0      Enabled  31C/ 87F  48C/118F  55C/131F  21C/ 69F
>> 1 INTERFACE 1      Enabled  31C/ 87F  48C/118F  55C/131F  20C/ 68F
>> 2 INTERFACE 2      Enabled  31C/ 87F  48C/118F  55C/131F  21C/ 69F
>> 3 PCI RISER        Enabled  31C/ 87F  48C/118F  55C/131F  23C/ 73F
>> 4 ODYSSEY        <not present>
>> 5 NODE              Enabled  31C/ 87F  48C/118F  55C/131F  22C/ 71F
>> 6 BEDROCK          Enabled  31C/ 87F  48C/118F  55C/131F  22C/ 71F

So it is working within limits for the moment and the only PCI board in that chassis is the IO9 board.
I might put the Adaptec board into an alternate chassis and check its power draw, as I think it might have a higher draw than the other board which is why it appears be be problematic.

All my O350 machines are reporting voltage in 1.7 range: 1.777, 1.791, 1.777, 1.777, 1.777

What voltages are (or were) your Tezro's reporting ?

Cheers from Oz,


John.
(This post was last modified: 11-21-2020, 05:40 AM by jwhat.)
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
11-21-2020, 05:39 AM
#6
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi John,
Good to hear you're following in my footsteps. I'm humbled.

Well, when I bought that Tezro system in 2010, I did what everyone does, LOAD IT UP! So I changed from 4GB of RAM to 8GB (all new sticks) had Quad 2Gb Fibre channel, 1Gb Ethernet, Adaptec Firewire, SCSI, 7.1 Sound, Etc..

Within a month or two of finishing all those acquisitions (early 2011), I started have the ATTN alerts in every open console term window, about my 5v line going low and the system demanding to shutdown (sorry, I don't have that 5v metrics of the final shutdown). However this always happened within 11-15 minutes of system power-on, but NEVER less than 10 minutes...which was...strange?

I posted on Nekochan and everyone was just as surprised, the obvious suggestions was to remove recently added things...so I started to...slowly. Eventually I removed the Adaptec board and I was able to run for 1 hour (before I got edgy and just shutdown the system myself) without any panics. I tried this consistently over a week's time. It held, and I reported that on Nekochan and went on my way.

Now, I recently found my mind blocked this totally out, I uncovered emails that show I had this 1.8v problem...but it was so traumatic I have no memory of this (not kidding). I found an old email to Ian at SGI depot mentioning if he'd ever seen a 1.8v panic and what I should do about it (he couldn't really offer me any more advice than sympathy). I had this log in the email and that's all I remember. I think I was just so disheartened it shook me hard and I stopped SGI collecting for many years, only to find recent deals to restart. I didn't have anywhere near the level of knowledge I have now and even what I know know doesn't totally put me on the side of right with my collection.

I'm driven to do something about it. But you realize the timeframe I'm talking about, Tezos where released in 2003, I bought mine used from a guy you bought it used from one of the "big dealers" in 2010. So it should have only been like a max of 7 years old and it was having this problem?!?!?! Now the previous owner might have run it 24/7 and used up the PSU life (I hope), so my focus on rebuilds went that way. But I'm worried all the same.

About 4-5 months after I "solved" my 5v line panic...I started getting a 1.8v line panic that looked like this (please note I only ran the system a hand-full of times and likely not more than 10 hours total before this stated):

************************************************ START OF LOG
NFO: Cannot enable VRM: 9

INFO: Cannot enable VRM: 10

INFO: Cannot enable VRM: 11

SGI SN1 L1 Controller

Firmware Image B: Rev. 1.32.6, Built 09/27/2004 14:40:25

001c01-L1>

001c01 ATTN: Cooling system stabilized

001c01 ATTN: 1.8V low fault limit reached @ 1.199V.

001c01 ATTN: brick auto power down in 30 seconds

001c01 ATTN: brick auto power down in 25 seconds

001c01 ATTN: brick auto power down in 20 seconds

001c01 ATTN: brick auto power down in 15 seconds

001c01 ATTN: brick auto power down in 10 seconds

001c01 ATTN: brick auto power down in 5 seconds

001c01 ATTN: brick is powering down now!
************************************************ END OF LOG


So 1.199v before shutdown. It would drop slowly over about ~1 minute, which is why I think it's heat related to ESR/heating for now. However since the 5v panic had occurred shortly before I think it's both a mix of some ESR changing along with an increase in MAIN PSU ripple output due to main PSU failing. The combination being bad. At this stage I can 100% say those special low-ESR caps aren't low ESR anymore. But are they bad? I don't really think so, but I do think the output from my PSUs is now so poorly regulated that it's a huge contributing factor (guessing).

I'm sort of stumped, but there are things I can do about it, one is to rebuild the PSU (which WAS WORKING the last time I shut it down 9 years ago), the other is to run measurement comparisons with my new Huntron 3200S Curve tracer to see the differences between my working Tezro and this Tezro. I hope between those two things, I can solve this mystery.

Fingers crossed.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-21-2020, 06:43 AM
#7
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
So upon further review, it turns out Tezro boards (likely all of this "last" SGI vintage) use pretty advanced Panasonic Solid Aluminium Polymer Caps (hot stuff for 2003!). It seems the version of the caps they use on the mainboard VRMs are just the "standard" ones, they have lower ESR, but you can go lower with newer revisions of this product. But this also means...they don't leak. Solid Aluminium Polymers claim not to leak (no fluid).

While I've not yet measured anything, I'm curious about that. There is a new (and hard to get) upgrade from this series of caps that uses a slightly smaller footprint (but I think may fit) that is even lower ESR while being MUCH higher life @ 105C. Most of these caps are rated for 2,000hrs @ 105C, these new caps can do 20,000 Hrs @ 105C as well as if you go from the original 470uF onboard to the new 560uF (470uF not offered in new series) you go from 25 mOhms to 14 mOhms of average ESR (that's big!).

Lots of little things to consider.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-21-2020, 09:50 AM
#8
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
(11-21-2020, 09:50 AM)weblacky Wrote:  So upon further review, it turns out Tezro boards (likely all of this "last" SGI vintage) use pretty advanced Panasonic Solid Aluminium Polymer Caps (hot stuff for 2003!).  It seems the version of the caps they use on the mainboard VRMs are just the "standard" ones, they have lower ESR, but you can go lower with newer revisions of this product.  But this also means...they don't leak.  Solid Aluminium Polymers claim not to leak (no fluid).

While I've not yet measured anything, I'm curious about that.  There is a new (and hard to get) upgrade from this series of caps that uses a slightly smaller footprint (but I think may fit) that is even lower ESR while being MUCH higher life @ 105C.  Most of these caps are rated for 2,000hrs @ 105C, these new caps can do 20,000 Hrs @ 105C as well as if you go from the original 470uF onboard to the new 560uF (470uF not offered in new series) you go from 25 mOhms to 14 mOhms of average ESR (that's big!).

Lots of little things to consider.

what happened with this guy ?

Indigo2 IMPACT  : R10K-195MHz, 1GB RAM, 146GB 15K, CD-ROM, AudioDAT, MaxImpact w/ TRAM.  IRIX 6.5.22

O2 : R12K-400MHz, 1GB RAM, 300GB 15K, DVD-ROM, CRM Graphics, AV1/2 Media Boards & O2 Cam, DV-Link, FPA & SW1600.  IRIX 6.5.30

 : 2 x R14K-600MHz, 6GB RAM, V12 Graphics, PCI Shoebox.  IRIX 6.5.30

IBM  : 7012-39H, 7043-140

chulofiasco
Hardware Junkie

Trade Count: (0)
Posts: 327
Threads: 50
Joined: May 2019
Location: New York, NY
Website Find Reply
07-11-2021, 12:28 AM
#9
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
(07-11-2021, 12:28 AM)chulofiasco Wrote:  
(11-21-2020, 09:50 AM)weblacky Wrote:  So upon further review, it turns out Tezro boards (likely all of this "last" SGI vintage) use pretty advanced Panasonic Solid Aluminium Polymer Caps (hot stuff for 2003!).  It seems the version of the caps they use on the mainboard VRMs are just the "standard" ones, they have lower ESR, but you can go lower with newer revisions of this product.  But this also means...they don't leak.  Solid Aluminium Polymers claim not to leak (no fluid).

While I've not yet measured anything, I'm curious about that.  There is a new (and hard to get) upgrade from this series of caps that uses a slightly smaller footprint (but I think may fit) that is even lower ESR while being MUCH higher life @ 105C.  Most of these caps are rated for 2,000hrs @ 105C, these new caps can do 20,000 Hrs @ 105C as well as if you go from the original 470uF onboard to the new 560uF (470uF not offered in new series) you go from 25 mOhms to 14 mOhms of average ESR (that's big!).

Lots of little things to consider.

what happened with this guy ?

I'm still around...have other SGI things ahead of this.

I did buy a professional PACE soldering tweezers and different tips, in order to "pluck" the caps right off the board!  So I need to finish my Indy PSUs, then I need to switch to rebuild a Tezro PSU (all my Tezros aren't safe to boot due to PSU voltage concerns) in order to safely boot a Tezro without power concerns.

It's unknown if this 1.8v issue is BECAUSE of PSU ripple and such, of if it's a separate issue with the ESR of the caps after the LDO regulator.  I expected to replace a few of the Poly caps anyway because they are being used for DC buck regulation.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
07-11-2021, 02:34 AM
#10
RE: O350 Voltage Fault:ATTN: 1.8V low fault limit reached @ 1.396V.
Hi Weblacky,

from the "o350 env chip location" thread: https://forums.irixnet.org/thread-3424-page-2.html

I like it when you say, "I have a THEORY"....

On the Tezro and the O350 the VRM are on the CPU board and not the interface board.

Have you by any chance looked at your Tezro VRM module/s, to see if the have the offending diode/s ?

Cheers from Oz,

jwhat/John.
jwhat
Octane/O350/Fuel User

Trade Count: (0)
Posts: 513
Threads: 29
Joined: Jul 2018
Location: Australia
Find Reply
03-10-2022, 11:06 AM


Forum Jump:


Users browsing this thread: 1 Guest(s)