IRIX Network Forums
What is it, Octane? - Printable Version

+- IRIX Network Forums (//forums.irixnet.org)
+-- Forum: SGI/MIPS (//forums.irixnet.org/forum-3.html)
+--- Forum: Hardware/Triage/Repair (//forums.irixnet.org/forum-11.html)
+--- Thread: What is it, Octane? (/thread-4549.html)

Pages: 1 2


What is it, Octane? - Podboy - 06-16-2025

Hi,
I'm posting to ask for a little help understanding what my Octane is trying to tell me.

When cold-booted, there is no boot tune, but fans are on, I get solid white from the lightbar, the hard disk doesn't spin up, and there is no graphics display. This seems to indicate an issue with the system board.
After 1 or 2 power resets it boots fine, and I can run the IDE, where the 'memory' test fails, indicating a failure on memory DIMMs.

My hope (and my wallet's) is that replacing a single component would solve the problems, like just the DIMMs or just the ip30 board. However I'm really wondering if the recent gfx board failure and now the memory, hd, and ip30 board are really trying to tell me I’ve got a failing PSU. 

I'm a little unsure about what component to replace.

Thanks in advance for any helpful insight on where these problems might be stemming from!


Background:
Months ago I dug my Octane out of the corner to get it ready to sell. It's big, heavy, and in the way. Unfortunately as soon as I booted it up, that impulse disappeared, feelings were rekindled, and it felt once again like a keeper.

After about a month of use, the machine would panic: kernel fault, indicating ESSI problems. Indeed, onboard diags directed me to replace the ESSI (MOT20) board. Did that. No more panic and graphics were back to normal. 

However, immediately after swapping gfx boards, the hard disk was not spinning up reliably, especially on cold boots. I have 2 identical 9gb drives (SGI IBM DDRS-39130W S95D), but only 1 installed at the moment. I swapped the drives to check if one was failing. No difference.

Further runs of the IDE would sometimes hang indefinitely, sometimes pass without problem. When the more in-depth 'memory' test is run, it fails every time. It finds a “Failure detected on the MEMORY DIMMS.” When the DIMM positions are swapped, the same test would sometimes call out a specific DIMM in S1 or S2. It's a 2x256mb configuration. No problem ever running ip30, pm, frontplane tests as long as they were run before memory.

I reseated all boards, DIMMs, CPU, disassembled to inspect and clean the SCSI backplane, and disassembled to clean and inspect the psu. Nothing noticeably awry. No change.

Just last week I popped in a new lightbar (thanks weblacky! Cool ) for another point of feedback. So now on cold boots: white light, no boot chime, fans on, no graphics, no hd spin up. With those symptoms, I think the flowchart for troubleshooting using the LEDs (Octane Owner's Guide) says the IP30 board needs to be replaced.

Here’s some hardware info. Please let me know if more info is needed.

Code:
Location: /hw/node
      PM10250MHZ Board: barcode HRP687     part 030-1426-001 rev  C
        Group ff Capability ffffffff Variety ff Laser 00000031cea2
Location: /hw/node/xtalk/15
            IP30 Board: barcode FLJ012     part 030-0887-005 rev  A
        Group ff Capability ffffffff Variety ff Laser 000000176496
Location: /hw/node/xtalk/15/pci/2
             FP1 Board: barcode 81781C     part 030-0891-003 rev  E
        Group ff Capability ffffffff Variety ff Laser 0000002f4f1c
    PWR.SPPLY.S2 Board: barcode AAC8040259 part 060-0038-001 rev  D
        Group ff Capability ffffffff Variety ff Laser 0000001b6cd5
Location: /hw/node/xtalk/12
           MOT20 Board: barcode HSD755     part 030-1240-003 rev  E
        Group ff Capability ffffffff Variety ff Laser 0000002cc31d
1 250 MHZ IP30 Processor
Heart ASIC: Revision D
CPU: MIPS R10000 Processor Chip Revision: 3.4
FPU: MIPS R10010 Floating Point Chip Revision: 0.0
Main memory size: 512 Mbytes
Xbow ASIC: Revision 1.3
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 2 Mbytes
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
  Disk drive: unit 1 on SCSI controller 0 (unit 1)
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3 parallel port: plp1
Graphics board: ESSI
Integral Fast Ethernet: ef0, version 1, pci 2
Iris Audio Processor: version RAD revision 12.0, number 1
  PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 2
  PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
  PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
  PCI Adapter ID (vendor 0x10a9, device 0x0005) PCI slot 3


Graphics board 0 is "IMPACTSR" graphics.
        Managed (":0.0") 1600x1200
        Product ID 0x3, 2 GEs, 2 REs, 0 TRAMs
        MGRAS revision 4, RA revision 0
        HQ rev B, GE12 rev A, RE4 rev C, PP1 rev H,
        VC3 rev A, CMAP rev E, Heart rev D
        unknown, assuming 19" monitor (id 0xf)

        Channel 0:
         Origin = (0,0)
         Video Output: 1600 pixels, 1200 lines, 59.83Hz (1600x1200_60)



RE: What is it, Octane? - Raion - 06-16-2025

The first thing I would do would be to pull the motherboard, pour a little bit of isopropyl alcohol into the DIMM slots and brush side to side with a toothbrush. The sockets can get quite dirty and that can cause a lot of what you're facing.

The power supplies are not known for failing gradually and instead fail quite catastrophically


RE: What is it, Octane? - weblacky - 06-16-2025

That being said it should be noted that most of the time SGI mainboards are powered by the 3.3 V and 5 V power lines of the power supply. They don't usually care about the 12 V line. The 12 V line is normally what powers your fans, the hard drives (along with 5V), and a lot of big ticket peripherals via VRMs. Now it's possible the graphics don't use the 12 V line, this has not been investigated on Octane and of course on other machines such as indigo2 IMPACT, it has special graphics connectors to get 3.3 V power at high amperage and doesn't use a VRM to step down a 12V source.  The Octane peripheral connectors have similar special connectors on the back point just for high amperage power. So I assume that they have their own mechanism for delivering low-voltage, high amperage, power from the power supply to the peripheral connections.

However as a later design, in the octane it would make sense if the power hungry graphics card we're stepping down the 12 V source as you would see in today's peripherals.

Octane power supplies are the latest current trend in failing, and requested, parts. Though I do agree with Raion in that we normally don't see octane power supplies degrade slowly. They normally just stop one day usually their protection circuitry kicks in. But that doesn't totally rule out that one of the power rails in the octane power supply is not performing correctly.

The IBM hard drive you specified has detailed requirements in its manual that actually says "12V + / - 7% is acceptable during spin up". And the drive itself is supposed to only allow a 5% variation on its DC power input.  It also claims that it will not accept more than 150 mV pp on the 12V line.

If you don't have another system that supports SCA 80, and you don't have an adapter card that at least lets you plug in a normal PC Molex connector to test the drive spin up then it is hard to know whether the drive itself has gone or whether it's a power issue.

The octane also has one of the largest number of electrolytic capacitors on its mainboard compared to nearly every other SGI I've seen. Now they use a solid aluminum electrolytic that doesn't leak out, but they're not polycaps.  I had assumed in the next few years that octane motherboard recapping would probably be required given the enormous amount of capacitors all over the thing.

I would say though without further information, and certainly I have had ram go bad in an SGI and an old PC that old, previously working, memory failure is super rare but it happens. Outside the RAM what it sounds like is bad power. If you can keep rebooting it and things start working that's normally a sign that the power supply or system VRMs are trying their best but not keeping up at their end of the bargain. And the fact you were running fine for a month until the errors started to happen would also lead me to believe that it's power related. That being said those graphics cards run hot and it could just as well be cracking solder joints. But I feel like the proliferation of problems all around the system points more to power than it points to the graphics because you're having more problems than just the graphics card. Granted I'm putting the hard drive aside on this one because you should try to test the hard drive spin up using a SCA 80 SCSI adapter.  If you can spin up just fine on an adapter with a normal PC power supply then I would definitely start blaming the power system in the octane.

Unlike my adventures in Indy, I've not gotten to the octane power supplies yet as there's other things to do and I'm still setting up equipment and setting up workflows for repairs in general. Octane is probably the hottest topic here on the forums in regards to people wanting power supplies and people needing troubleshooting help. Because right now they are the sweet spot in the economy of SGI collecting.

Right now what I would suggest is perhaps to put it aside until you're able to secure a known good power supply. Unfortunately every power supply you come across is going to be nearly 30 years old. So a working one would still only get you by for maybe a few weeks or months before this would may happen again.

I recommend that you maybe put it to bed and keep watching the forums when one of us starts offering a rebuilt octane power supply, which unfortunately is going to be a little while. I also don't think they'll be cheap. The octane power supply is heavy, dense, has quite a number of PCBs in it, and likely will be that much more difficult to fully rebuild and reconstruct correctly along with the proper adhesive and testing.  Octane is also one of the few power supplies that is oversized in shipping, in its length dimension.

Octane power supplies are on the verge of what we would call smart, they do have some overcurrent and overvoltage protection in them but they're not as smart as a modern power supply today. We've definitely seen where they will cause charred damage to the connectors on an octane during a surge or malfunction. It's rare but we have seen it, so that means that the octane protection mechanisms do not generally encompass all the scenarios you might encounter related to a malfunction.

I would say it's worth keeping an octane but similar to your Indy I think you'll need to wait for refurbish power supplies to come online unless you want to try to hunt down another used octane power supply you think you can trust just for testing purposes. Either way any power supply you get will be so old that if you wish to use it very frequently in the foreseeable future the power supply could not be trusted, without some level of refurbishment at at least the filters/capacitors level.


RE: What is it, Octane? - Podboy - 06-18-2025

Thanks for the quick replies from both of you.

Unfortunately scrubbing the dimm slots with IPA and a toothbrush didn't fix the memory errors. Man, I wish that worked, but at least that bit of the board is looking a little less grim.

I will look into testing the drives with an adapter. I was kinda thinking that the odds of both drives failing at the same time was low enough that if swapping drives resulted in the same symptoms, it probably wasn't a drive issue. It was enough to convince me, but who's to say without further testing.

The machine currently has an 'Updated' Lucent 623W PSU in it (060-0038-001). Is there any downside in searching for a Cherokee 747W for testing or as a replacement? Would it be fair to assume that if there ever is a time that folks are refurbishing Octane PSUs, the Cherokee one would be undertaken first?


RE: What is it, Octane? - weblacky - 06-18-2025

I've always been told anecdotally that the Cherokees were better power supplies, though I have absolutely no data back that up. Obviously being more watts = better :-)

If I were working on these power supplies I would also target Cherokee first because it's the higher end of the two options for octane and is compatible with every octane. Hence it will have the highest demand among the two. It would also stand a reason that people would expect the lucent power supply to be cheaper because it's a lesser rated unit. In repair this would be a problem as it is not a realistic expectation given that the effort in both is similar.

While it might seem obvious on the surface, there's a bit of a nuanced problem when it comes to SGI repair at the current moment. There may be demand but the price structure may not support doing the work. Now obviously they're always outliers, there are always a few people that are willing to spend the cash to get what they want, to have what they want, so that things just work. But for the vast majority of customers that's not the case.

The problem you run into, is the "starving SGI collector" routine. This happens much less than it used to, but 15+ years ago it was very common to pick up an SGI station for a few hundred bucks, fully working at the time right off of being decommissioned by some aerospace company or some video production house. You would then pay $45 for a massive 36GB SCA 80 SCSI hard drive and boot it up and play all day long.

So many people, that have had SGI collections for 20+ years, know that SGI's were expensive retail but have no concept of how expensive they really were. They picked them up from a pallet for maybe $200 to $300. If you ask them to spend $500 on a rebuilt power supply what do you think their reaction would be? "That's over twice when I paid for the system!" Even if you take in the fact that inflation makes that a pretty good deal, it's still a number that drives people away.

Now times have changed, there's no denying that, in the past 10 years you've seen prices rise dramatically and inventory fall dramatically. As new collectors come into the hobby there's also those that have never had the opportunity and now are paying higher prices to enter the hobby. I'm sure this is really true of any collecting hobby, it's not unique to vintage computers. But just like any vintage appliance, age is a form of wear and tear and at the end of the day parts need to be replaced if you want the item to keep working.

Right now repair/rebuilt parts directly compete with the used part market, assuming you can get the part you want on the used market. Some parts are now so rare that there is no used market available for them. You would think that some of this isn't quite apples to apples. Are you really going to compare a nearly 30 year-old power supply to one that has been gone through and refurbished with new consumables?

Well a lot of people would equate them to being equal. Because at the end of the day to most people a power supply is binary, it either works or it doesn't. Very few people understand that there's an in-between phase where you're doing damage to your computer that a new power supply is not going to magically solve. An old but functional power supply can feed bad power and cause damage to a circuit before the power supply itself engages its own protection circuitry or simply outright fails. And at that point now it's done additional damage that has to be addressed.

Right now I'm placing my focus on parts for which there is no used market supply for, without getting in the detail there is a high scarcity of the last and highest spec-d parts for the chimera family that's even still used in some places. I can't go into detail of who uses them because that's kept close to the vest by clients I've interacted with. Suffice to say that people who still use these for business produce a shadow industry that has drained the used and dealer inventory of a lot of the Chimera parts (e.g. V12 Graphics cards and Video import cards, DCD, Quad CPU boards, 1Ghz CPUs, etc...). The reason you don't see these parts is they've been consumed by entities that still run these systems. Systems like octane and older have a different issue.

But as long as there's a fairly brisk trade online in used SGI parts, the used market tends to set the value. Often times refurbishment work is going to have a higher cost than a used, but functional, part. There will come a time when there will be such scarcity that repair will be the only option and hopefully people will not have thrown away they're broken parts so we have something to repair.

But right now if you were to make a business around repair the only entities that are going to pay you money are those that have active demand for the chimera parts. I'm sure they'll be people interested in the older parts as well but there's usually not enough to keep the lights on if you're actually running a business. To some effect I'm going to try to balance the two, but right now business users take priority because they pay the bills. So their concerns are being addressed first.

It's my sincere hope that in the next couple years that business needs will be then me and attention can be returned to the older SGI hardware that will allow at least the option of buying repaired parts where none currently exists. But even at that time, be prepared for retail pricing because at the end of the day that's what needs to happen.


RE: What is it, Octane? - Podboy - 06-18-2025

(06-18-2025, 01:14 AM)weblacky Wrote:  If I were working on these power supplies I would also target Cherokee first because it's the higher end of the two options for octane and is compatible with every octane. Hence it will have the highest demand among the two. 
Yeah, that's what I was thinking. It covers more bases.

The comment on people not wanting to spend a realistic amount of money to repair something they bought for pocket change strikes an amusing chord. I find myself much more willing to continue to sink money into my Indy which I paid a good chunk for back in '96, than I am to spend it on the Octane which I bought for an order of magnitude less just a few years later. The only real difference is that I got a better deal on one. Joy


RE: What is it, Octane? - Podboy - 07-10-2025

So, as a follow up...
It turns out that replacing the DIMMs was all it took to get the system to behave normally again.

Hats off to Raion for identifying the symptoms as a simple memory error!

Can't help it, but I was a bit shocked that was all it took to get the hard drive spinning up reliably and the system to POST. Not knowing the system well, I imagined that I would have been consistently greeted with a specific memory error rather than with an inconsistent and non-descript ip30 board failure. To my logic, the memory (and hd) error struck me as a board failure and not the other way around. And certainly the lightbar blinks reinforced that thought.

It's interesting, and I hope this helps some future someone diagnose their Octane probs.


RE: What is it, Octane? - Raion - 07-10-2025

I appreciate the praise, but it was a simple guess based on the symptoms. I'm by no means an SGI doctor.

I'm just glad the forum has its uses for people, contrary to my critics.


RE: What is it, Octane? - Podboy - 07-23-2025

I noticed a couple glaring items while checking out my failed ESSI today.

[See Photos of C234 & C79A]

These are the only ones I noticed. I could barely see the small crack on C234, except in the right light.

Since there are no visible markings on the caps, I'm wondering if there is any information on the specs for replacing them? The board is a 030-1240-005 rev. D


RE: What is it, Octane? - weblacky - 07-23-2025

In C79A picture, semiconductor U8 exploded as well.
Based on pictures cap layout on C234 is same on Octane SSE card: http://www.sgistuff.net/hardware/systems/images/octane-sse-1802.jpg

(07-23-2025, 02:35 AM)Podboy Wrote:  I noticed a couple glaring items while checking out my failed ESSI today.

[See Photos of C234 & C79A]

These are the only ones I noticed. I could barely see the small crack on C234, except in the right light.

Since there are no visible markings on the caps, I'm wondering if there is any information on the specs for replacing them? The board is a 030-1240-005 rev. D