InfiniteReality 3 DG problem -
vvuk - 10-23-2021
I recently got my hands on an Onyx 350, with an IR3 with a DG5-2 / TVO. However, I'm getting no output from the DG. Running `irsaudit`, I get some failures related to the VOC registers:
TEST vocreg VOC Registers
INFO ERROR: VOC#0: reading VOC_IWSC_ORIGIN (0x204) = 0x5555
INFO (expect 0x155555)
INFO ERROR: VOC#0: reading VOC_IWSC_SIZE (0x205) = 0xaaaa
INFO (expect 0x2aaaaa)
INFO ERROR: VOC#0: reading VOC_VOF_VERT_PHASE (0x208) = 0x5555
INFO (expect 0x15555555)
INFO ERROR: VOC#0: reading VOC_PFD_CONTROL (0x230) = 0x5555
INFO (expect 0x15555555)
INFO ERROR: VOC#0: reading VOC_PFD_SEQUENCE_LO (0x231) = 0xaaaa
INFO (expect 0xaaaaaaaa)
INFO ERROR: VOC#0: reading VOC_PFD_SEQUENCE_HI (0x232) = 0x5555
INFO (expect 0x55555555)
INFO ERROR: VOC#0: reading VOC_PFD_FRAME_DUR (0x234) = 0xaaaa
INFO (expect 0xaaaaa)
...
INFO VOC #1: FAILED with 53 errors.
DIAG Replace DG board; BEFORE replacing any ASICS, ensure clocks
DIAG on the DG are running properly (like the VOF clocks for
DIAG VOC failures). Faulty or not running clocks _will_ cause
DIAG VOC failures. Otherwise, repair by reworking flawed
DIAG ASICs.
It's oddly only managing to read the low halfword of each 32-bit register; upper halfword seems to be zero. Anyone have any suggestions? Or any notion of how to ensure "clocks on the DG are running"?
RE: InfiniteReality 3 DG problem -
Raion - 10-23-2021
So I have an Onyx2 with a DG and IR3. So I have some experience here with these systems.
So first, let's start with physical: the connectors for the DG are delicate, easily bent, and also can get clogged with dust.
Here's how to remove it properly:
Step 1: Make sure you have sufficient space behind the unit. It's a big board. Don't be a dumbass like me.
Step 2: Gently unscrew both sides about 2-3 turns at most. When you feel resistance, grab the pull handles and gently wiggle it out some.
Step 3: Repeat this process, but as you get it started, wiggling should not be necessary and it should just pull straight out.
Ok, so you got the board out, and none of the ASICS are bad, none of the components appear knocked off. Blow out the connectors with compressed air, then turn to the inside of the chassis.
Take a strong flashlight and inspect the connectors for dust, debris, bent pins etc. I've never bent pins, fwiw, so no advice for unbending them.
If you see any dust, you're gonna have to clean the connectors. Take a credit card, wrap it in some sturdy paper, and wet the paper with windex or alcohol. Run it gently through each of the lines, then blow them with compressed air.
Ok, so you cleaned the connectors. Reseating is easyish.
Step 1: Line the board in the tracks in the chassis.
Step 2: slide it in gently, but ensure it's in the tracks, and use your light as a visual guide, pulling nearby boards if necessary to get a good view.
Step 3: Seat it by pressing on the back straight in. I do this with both hands, and a piece of wood to spread the force across it. Once the screws start biting, carefully thread them in (don't cross-thread, I've done it and had to chase the threads. Not fun!!!) and then do the same screw, slide, screw thing. Once the board is almost all the way in, you can safely wiggle it into place if needed, but please try to just press it in.
As for checking if they're running, I don't think SGI released any software to test that sorta thing so you'd need to use a logic probe or scope to find clock signals, and with a chassis like that, it doesn't sound fun.
If you need a DG, I may have a spare socked away somewhere, but you can find them on that one evil auction site that mustn't be named, just go into business and industrial adn throw part numbers in.
RE: InfiniteReality 3 DG problem -
vvuk - 10-23-2021
Yeah, just did all that -- board came out cleanly, not much dust inside, everything looked quite clean. No bent pins or anything weird with either the board or the backplane. Process was definitely a little annoying esp taking it out, I expected the screws to come all the way out and then pull the board out. Seems like a bad design, almost guarantees weird force on the connectors at some point!
I actually have a second DG5-2 coming in a few days that I picked up to fill out the second pipe, so I'll have a replacement; just such a weird failure mode.
RE: InfiniteReality 3 DG problem -
Raion - 10-23-2021
Yeah, it's not a good design. I'd prefer that the screws were spring-loaded so the board could go in flush and such without the screws.
But these were not designed to be serviced by owners/onsite staff, no, for a 300k+ machine you'd be calling up SGI to send field techs out.
I'll ask my former SGI employee friend Shawn -- he will know if these components were:
"Field diagnosable" (Meaning it would be possible to diag in the field)
Or
"Complete Replacement Unit" meaning the broken one is replaced, and the original would be RMA'd
RE: InfiniteReality 3 DG problem -
vvuk - 10-23-2021
A full `irsaudit` (with -continue) found no other errors; everything else is working fine, so it's definitely the DG. (I reseated the KTOWN etc just in case; unnecessarily.)
So given that:
- the failures all manifest as reading 0's for the top 16 bits in VOC registers
- all the registers look like they're related to configuring the output video (VOC = video out config?)
- the failures are identical for both VOC1 and 2 (2-monitor DG)
I'm going to guess that the problem is _not_ in anything that's actually duplicated, but either in bare connections, or some solder issues on some chip. Top 16 bits is suspicious, but I can't imagine it would be 16 pins of some chip that happened to break off exactly :p Then again who knows, it could be something exactly like that.
I guess once I get the replacement DG and get everything working, I'll take the TVO option board off this one and give the underlying DG a good close look.
RE: InfiniteReality 3 DG problem -
weblacky - 10-23-2021
I’m nowhere near an expert, but, in situations like this where an entire section of interaction is gone, I look to see if there is some kind of multiplexing ic that controls the top-half of the bus to the chip you’re referring. You’re right that all those solder joints aren’t likely to break so evenly, where it’s possible some kind of amplifier/multiplexer/enabler/etc ic is in front of the bus and is responsible for repeating or timing the queries. Since you get 0 instead of a floating state response, it makes sense that there might be parallel lines/channels of such communication that allow queries on a wider bus.
If you can closely examine the chips that are on the lines of the IC you’re referring to you might be able to find duplicate ic components and layout that would lead you to believe there are upper and lower busses that get and assemble the entire response. you can then measure using various tools/meters to see if both parallel channel’s chips have the same basic measurements to various tests. In that perhaps the solder joints of one chip or bad or the enable line of one chip is bad or that the chip shorted out in someway where it never hears responses and always gets zeros for anything queried through it.
Then measure that same chip in the nearby parallel set/channel for a comparison. You may find an upper communication channel has an ic that does measure the same as the successful lower channel bus.
RE: InfiniteReality 3 DG problem -
jan-jaap - 10-23-2021
It's worth knowing that an Onyx2 DG is a combination of a base DG5-2 and optionally some addon board to make it a DG5-8, a DG5-GVO, a DG5-GVO or some other oddball part. The base DG5-2 is rather easy to find and not expensive, the desirable bit here is the TVO option.
Therefore, I wouldn't waste my time and simply get another DG5-2, transplant the TVO option and the metal bits, and be done with it. Chances of fixing anything on one of these without access to SGI internal documentation and diagnostics equipment is near-zero.
RE: InfiniteReality 3 DG problem -
vvuk - 10-24-2021
Yep, I've got a second DG5-2 coming (with TVO even) so hopefully that will be all I need to get it up and running.
I took the TVO option board off and took a close look at the main DG5-2. Zero visible damage or even anything that looked off -- pretty sure it's just been enclosed in the cabinet most of its life. Virtually no dust. Two bodge wires that are on every image I've seen, both properly connected still. I was hoping to find something that looked like a small microcontroller or other bus arbiter; all I could find that had meaningful part numbers was a PLL clock generator, and a pile of 16-bit bus transcievers, all of which looked visibly fine. There are two large custom chips clearly labeled "VOC 1b"; both BGA, both still solidly soldered. But since register access on both is failing identically, the problem is definitely somewhere earlier.
irsaudit is quite happy with everything else, so it's either the DG5-2 or it's the backplane, which would be.. not great. It *could* also be the KTOWN, and the ARM controller that (I believe) lives there and manages the overall IR system. But given that the other 5 cards (GE + 4 RM) pass all tests, all signs point to the DG5-2.
A little side interesting bit -- trying the board with TVO removed, irsaudit knew what I did and complained, but given it sounds like folks move the option boards over without problem it's not an issue in practice:
Code:
TRCE DG NIC #: 0000.0059.c878 (family: 0b)
TRCE Serial #: ......
TRCE Part #: 030-1242-001
INFO DG NIC reports DG w/ option board but no option board detected
INFO NIC part number field must be one of 030-1055-XXX (2 VOCs),
INFO 030-1087-XXX (8 VOCs),030-1524-XXX OR030-1242-XXX (2 VOCs
INFO with GVO/DDO2/DPLEX/TVO).
DIAG Fix by correctly reprogramming the DG NIC.
TRCE rev_code: H
TRCE name: DG5-2
There's a little 2-pin header labelled "NIC" in the corner of the board too; sometime in the next week or two I may stick a logic analyzer on it and see what I can see. I have a working theory it's i2c just due to VOC & other major chips manufactured by Philips Semi, where I2C was big in the 90s. (Though you'd need to pull a common ground from somewhere else; it's a little weird to not just put a ground pin in the same location)
I've got some friends-of-friends who may have direct knowledge of the hardware. I'll ask and see if I can get a bit more detail about how some of the comms here works. Probably won't help me with this board, but may be useful info to someone down the line.
RE: InfiniteReality 3 DG problem -
robespierre - 10-24-2021
(10-24-2021, 06:05 AM)vvuk Wrote: There's a little 2-pin header labelled "NIC" in the corner of the board too; sometime in the next week or two I may stick a logic analyzer on it and see what I can see. I have a working theory it's i2c just due to VOC & other major chips manufactured by Philips Semi, where I2C was big in the 90s. (Though you'd need to pull a common ground from somewhere else; it's a little weird to not just put a ground pin in the same location)
No, it's
1-Wire. The second pin
is the ground pin.
RE: InfiniteReality 3 DG problem -
vvuk - 10-25-2021
Ah, thank you! That was I guess the more obvious guess, I was misled by i2c being used elsewhere in irsaudit (for DDC).