Most likely fault location on a GR1.5...

Most likely fault location on a GR1.5... - Printable Version

+- IRIX Network Forums (//forums.irixnet.org)
+-- Forum: SGI/MIPS (//forums.irixnet.org/forum-3.html)
+--- Forum: Hardware/Triage/Repair (//forums.irixnet.org/forum-11.html)
+--- Thread: Most likely fault location on a GR1.5... (/thread-1508.html)

Pages: 1 2

Most likely fault location on a GR1.5... - mapesdhs - 09-18-2019

Hi all! Smile

A followup to my previous thread about fixing an IP6 for Personal IRIS; I'm now trying to repair a GR1.5 base board (in final assembly it will have a VB and Z fitted, but for the moment it's the base board which needs fixing). I have a number of GR1.5s but none of them work properly. The least bad is a board which shows vertical lines (these are high res images so one can zoom in):

http://www.sgidepot.co.uk/misc/P1190506sl.jpg
http://www.sgidepot.co.uk/misc/P1190507sl.jpg

Here's a reasonably high res image of a GR1.5 (not the one I'm working on btw):

http://www.sgidepot.co.uk/misc/P1190508sl.jpg

The gfx architecture is described in the technical report, Chapter 3:

http://www.sgidepot.co.uk/pitechrep.html

I started doing some diode mode testing on another board to get a baseline idea of what some of the ICs should show in terms of readings (already checked all caps and resistors), but I thought I'd post here to ask: given the display output, anyone have a notion as to what the most likely origin of the bad picture probably is? A VRAM IC? Support IC to the VRAM? XMAP? Static RAM between the XMAPs and DACs? Just pondering where I should start so as to have the best chance of locating the fault sooner rather than later, ie. minimise the time wasted searching less likely fault locations. I don't think it's the RE2 chip since three different RE2s show the same output.

The VRAM is present as 12 rows and 5 columns of HM53461ZP-12. Nearby there are 4x CD74ACT646EN, below these are 4x CD74AC174E and one CD74AC157E. Many other ICs aswell but these are the main ones which support the VRAM. Still reading up on what the other ICs are for.

I'm now thinking perhaps one of the XMAPs is bad, so that's my next target for diode mode testing. So far I've taken measurements of every memory IC and all the ICs mentioned above; comparing to other boards didn't show anything conclusive. Problem is, until I have a fully working board I can't be certain what normal readings should be, so I'm gambling that multiple other boards are unlikely to all have the same fault and thus by cross checking several I can hopefully infer what should be normal readings. Ultimately I plan on buying a great many NOS spares and refurbing the other boards to fully working order. In time I'll add all the info to my site.

Thoughts welcome! Smile

Ian.

PS. Uber thanks to jan-jaap who's already helped a lot with my Personal IRIS woes. 8)

RE: Most likely fault location on a GR1.5... - ColanderCombo - 09-18-2019

(09-18-2019, 07:25 PM)mapesdhs Wrote: The gfx architecture is described in the technical report, Chapter 3:

FWIW, the IrisVision boardset also implements the Eclipse architecture, and there's *very* detailed technical information including technical manuals for that here:

http://ps-2.kev009.com/ohlandl/video/Iris.html

There's complete register level documentation for every chip, which is pretty remarkable.

The XMAP2s are 5 way interleaved, each processing every 5th pixel on a scanline. The vertical lines look to be about every 5th pixel, which suggests to me an XMAP2 has gone bad

BUT.... The SGI logo image looks just fine. The XMAPs can switch between direct and color mapped mode on a per-window basis (using some combination of bits in the framebuffer), so I wonder if the color mapping memory attached to one of the XMAPs is to blame?

Display Subsystem: http://ps-2.kev009.com/ohlandl/video/iris_index_files/irisvision_technical_reference_7.pdf

(09-18-2019, 07:57 PM)ColanderCombo Wrote: BUT.... The SGI logo image looks just fine. The XMAPs can switch between direct and color mapped mode on a per-window basis (using some combination of bits in the framebuffer), so I wonder if the color mapping memory attached to one of the XMAPs is to blame?

Also, the color map should be the static ram between the XMAPs and DACs. I don't know precisely how they're split up, but the CY7C169's are 4Kx4 components. Color map is 4Kx24, so there are 6 chips per XMAP for a total of 30. See page 147: http://www.bitsavers.org/components/cypress/_dataBooks/1988_Cypress_CMOS_Data_Book.pdf

RE: Most likely fault location on a GR1.5... - jan-jaap - 09-18-2019

Ah, the infamous pinstripe of death :(

If you can find the root cause of that one I have about half a dozen of RM4 and RM5 boards plus some GTX RMs' that need fixing. Oh, plus some RM6's from my Onyx IR.

Reading the tech docs you will have noticed they mention how many bit planes each gfx option has: regular bitplanes, overlay bitplanes, aux bitplanes etc etc. Some are used to e.g. draw the laucher menu on the desktop of older IRIXes (it's really a GL widget). Now look at the "Silicon Graphics computer systems" logo of that boot screen. No pin stripes. They exist only on the "main" bitplanes. Keep that in mind when you go hunting for the location of the problem: it's not in the RGB DACs, and it may not even be in the framebuffer memory. It looks more like something related to hardware which selects which bitplane is "active" for a given pixel.

The hardware works in "spans" which work in parallel. That's why the damage is vertical stripes: one stripe damaged, several stripes undamaged. A stripe can be partially damaged (not in your case, but I've seen it): in that case you don't see a vertical line but a vertical pattern of dots.

I'm curious whether a "pinstriped" GR1.5 will pass the diagnostics. You might want to run them in "verbose" mode (FE, Field Engineering). It works like this. Boot the system to the command prompt using a serial console:

Code:
Console DUART test                      PASSED

Memory walking bit test                 PASSED

Memory address uniqueness test          PASSED

Interrupt mask registers test           PASSED

Graphics subsystem test                 PASSED

                           Starting up the system...

               To perform system maintenance instead, press <Esc>

System Maintenance Menu

1) Start System

2) Install System Software

3) Run Diagnostics

4) Recover System

5) Enter Command Monitor

Option? 5

Command Monitor.  Type "exit" to return to the menu.

>> hinv

           Memory size: 16 Mbytes

Instruction cache size: 64 Kbytes

       Data cache size: 32 Kbytes

             CPU board: IP6 20 MHZ

         System option: Floating Point Processor

             SCSI Disk: dksc(0,1)

>>

Then from the prompt, type "ide fe" to boot the diags from the hard disk. When it starts to put out test results, hit CTRL-C (we're not interested in a lengthy surface scan of the hard disk for example).

Code:
>> ide fe

341632+163152+1251424 entry: 0x800ac930

SGI Version 4.0.5 IP6 IDE field Jun  9, 1992

Report level set to VERBOSE

              Memory size: 272 Mbytes

   Instruction cache size: 64 Kbytes

          Data cache size: 32 Kbytes

                SCSI Disk: dksc(0,1)

            System option: Floating point processor

TLB data test

Test completed with no errors.

TLB probe test

Test completed with no errors.

TLB translation test

Test completed with no errors.

TLB valid bit test

Test completed with no errors.

TLB mod bit test

Test completed with no errors.

TLB pid test

Test completed with no errors.

TLB global bit test

Test completed with no errors.

TLB cached bit test

Test completed with no errors.

UTLB miss exception test

Test completed with no errors.

Time of day clock test

Interrupt

Now you've got an IDE (diags) prompt. Type 'gr1' to run the Eclipse tests in verbose mode:

Code:
ide>> gr1

Graphics sub-system type: GR1/VGR

Address uniqueness test of microcode RAM ...

Test completed with no errors.

Walking bit test of microcode RAM ...

Test completed with no errors.

Address uniqueness test of microcode data RAM ...

Test completed with no errors.

Walking bit test of microcode data RAM ...

Test completed with no errors.

Host checking of data fifo ...

Test completed with no errors.

Microcode based data RAM test ...

Test completed with no errors.

Fifo/dataram read test

Test completed with no errors.

Finish flags test

Test completed with no errors.

GE5 DMA test ...

IP6 to GE5 DMA test ...

GE5 to IP6 DMA test ...

Test completed with no errors.

Host test of XMAPs and color maps

Test completed with no errors.

Starting DAC tests ...

  Checking Red DAC

  Checking Green DAC

  Checking Blue DAC

Test completed with no errors

Graphics sub-system type: GR1/VGR

Testing the RE ...

RE lines test

RE spans test

RE window ID checking test

RE patterns test

RE dithering test

RE z-buffer comparison test

RE delay signal test

Test completed with no errors.

IP6 to bitplanes DMA test ...

Test completed with no errors.

Cursor test (checking cursor #0)

Test completed with no errors.

Graphics sub-system type: GR1/VGR

Bitplane expansion (BP4) test ...

  Writing  0x55555 to expansion bitplanes

  Reading and testing ...

  Writing  0xaaaaa to expansion bitplanes

  Reading and testing ...

Test completed with no errors.

Auxiliary planes option test ...

AUXILIARY PLANES OPTION NOT INSTALLED

Z-buffer test ...

  Writing 0x555555 to z-buffer

  Reading and testing ...

  Writing 0xaaaaaa to z-buffer

  Reading and testing ...

Test completed with no errors.

Graphics sub-system type: GR1/VGR

TURBO OPTION NOT INSTALLED

ide>>

"gr1" is a script which runs a bunch of individual gr1_* tests. Type "help" from the ide>> prompt to get a list of possible commands.

NB: ide 4.0.5 seems to trip over the amount of RAM on this IP10 board. There are 4 * 4MB 30 pin Toshiba SIMMs installed, PROM and IRIX agree, but diags list 272MB, 256MB too much.

RE: Most likely fault location on a GR1.5... - mapesdhs - 09-18-2019

Thanks guys! That's been my rationale too. Oddly enough I found the kevin009.com site yesterday which is why, after reading through the docs, indeed my thinking moved more towards an XMAP2 or CLUT memory. The described parallelism reflects the 12x5 layout of the VRAM ICs, etc., but I didn't want to clutter up my OP with that. Eventually I'd like to sort out combined info pages that show how a board functions conceptually, electrically and schematically, hopefully construct PDFs if none exist atm so that one can work on repairs in the same way Rossmann does. That's a very long term goal though.

Alas the striping is so bad it makes text too unreadable, not worth running the ide tests as it is. Re fixing such things, that is my ultimate goal, to offer repairs instead of just supplying replacements, since working spares/systems are almost impossible to find these days. A very different kind of task to what I do atm though, very time consuming. I'll be about two years before I'm good enough I think.

I'm about to knock up an XMAP2 pinout in WinFIG. Biggrin

Ian.

RE: Most likely fault location on a GR1.5... - ColanderCombo - 09-18-2019

(09-18-2019, 09:35 PM)mapesdhs Wrote: I'm about to knock up an XMAP2 pinout in WinFIG.

Sweet.

I notice in the second image that the cyan background is missing green, but inside the text window the black background turns red. It seems like when the stripe crosses white pixels it turns black... which suggests to me that there may be one or more bad address lines between the XMAP and the color map ram? (so the wrong color map entries are getting selected.)

If you can boot to X11 I think there's a tool that lets you interactively create color maps. I'd be interested to see how that behaves on this system. Might help to isolate which lines are bad.

RE: Most likely fault location on a GR1.5... - weblacky - 09-19-2019

While perhaps not practical, given that people have chimed in with similar issues, I'm wondering if you're dealing with a structural issue. If you can find a pin/trace or Basic IO pinout of the XMAP2 chips, and place an oscilloscope at the same input of each chip (to verify signal), then later at the output (whatever form that may be) of each output pin (perhaps input of next stage instead). If they work in parallel (not in rapid series) then the failing unit should be visually different from the output of the others (lack of output I'd think).

I'm wondering if you're not dealing with a cracked BGA joint (chips look BGA, I don't have a pic of the other side to know for sure). You mention having check all caps and resistors...so what else is there beside the ZIP memory modules?

If you could find a way to attach a 30awg+ wire or something to an output point of each pipeline and just look at the signal (without even understanding it), you'd assume 1 out of the 5 doesn't look like the others. Then you can decide if it's chip or joint.

From what's being said here there are two failures (complete and partial), this makes me think of a connection issue, rather then a power issue. I'm not an expert but the picture of the board you provided doesn't appears to have much in the way of power boost/buck circuits...so I don't see much in the way of voltage conversion on this board. So considering a power issue, wouldn't the symptom would be more widespread (affect all pipelines at the same stage)? I don't see a huge VRM section or anything like that...so unless they somehow hid it inside an IC back then (doubtful) I think they are working off a common rail.

Consider finding someone with a really good BGA rework preheater station, and reflow the expected XMAP2 chip and see the affect.

Food for thought, considering it's not a idea I saw proposed in this thread. We see this MUCH more today than back then, but considering I don't see any heatsink on these, it could stand to reason that repeated localized heating and cooling could cause thermal expansion and cause a cracked/poor (high resistance) joint? Which would affect signals.

Also consider hitting it with a FLIR camera and check if one of the 5 XMAP2 chips (both sides) is hotter/colder than the others by comparison.

RE: Most likely fault location on a GR1.5... - jan-jaap - 09-19-2019

There are no BGAs in this generation. Mostly through-hole with the odd SMD. Could still be cold soldering joints of course.

I have long suspected that similar problems in later generations (Reality Engines) have a root cause in the fact that the boards are relatively flexible, but they are littered with large surface mounted QFP chips. That's a recipe for cracked joints and lifted legs. Later boards (Infinite Reality) are much more rigid.

RE: Most likely fault location on a GR1.5... - mapesdhs - 09-19-2019

Reflowing the solder points is also something I may do. I have the relevant equipment (a Quick 861DW and a simpler vacuum desoldering gun).

https://www.amazon.co.uk/Vogvigo-Soldering-Desoldering-Temperature-100℃-500℃/dp/B07KFHCHMR/
http://www.sgidepot.co.uk/misc/P1160827s.jpg

I also have a microscope, camera, monitor, B&K Precision 393 and all the rest of it, but I was planning to spend two years learning how to do all this stuff before actually doing repairs for people. Biggrin

I've ended up kinda in the drink before knowing how to swim. :} The microscope, camera, power strip and other stuff are not yet setup. I also have a large ultrasonic tank but not yet a drying oven or somesuch.

http://www.sgidepot.co.uk/misc/P1150340s.jpg

Done lots of XMAP readings last night, two more ICs to do just now, then I can compare. After that I'll go over the static RAM ICs.

There aren't any power issues that I'm aware of, that's all handled elsewhere (so far I've not found any ground shorts); I figure DC comes to the board via a ribbon cable from the PSU (the same cable connects to both boards).

Ian.

RE: Most likely fault location on a GR1.5... - weblacky - 09-19-2019

Hi Ian,
I'd highly recommend investing in a cheap infrared board heater (at least). I've dealt with several (newer) multilayer PCBs and even though you can force a lot of heat into the vias and the pull components, the large planes will sync heat very very fast, often resulting in either substandard reflow joints or over-temping the IC will not having enough overall heat to actually move the solder at all. Clearing small vias is the worst without preheating.

Please try to heat the boards (15-20 mins at least) to 150C to 170C, then use the iron on them to reflow and such. I've made the mistake of thinking I didn't need to do this...and have wasted a lot of time trying. Preheat the board and you'll find the reflow will be much easier.

RE: Most likely fault location on a GR1.5... - mapesdhs - 09-19-2019

Well, I'll see how it goes, I may not need to go quite that far. Biggrin

Let me know if you have a product ref for a heater like that though.
Atm I only have my kitchen gas oven. :}

Ian.