The start of a LONG Fuel repair thread...
#46
RE: The start of a LONG Fuel repair thread...
(11-12-2021, 11:04 PM)jwhat Wrote:  Hi Weblacky,

really long post....

I have to take kids to the museum as looking at my old SGI boxes is not a exciting as DINOSAURS!! ;-)

I will review on return.

Cheers from Oz,

jwhat/John.

Thanks, it's as long as it is demoralizing :-)  Also, you could compromise and take the kids to see SGIs at a computer museum?

Also, assuming this HW graph node numbering ISN'T random.  Can some use something like this find to find out WHAT HW this number is: https://nixdoc.net/man-pages/IRIX/man1/xbstat.1.html

Okay...sh*t storm 2: the legend continues....

Just fir yucks I tried the PIMM no guri gave me that's suppose to have a voltage error...well it does still have the voltage error, it causes/reports the 5v PIMM AUX voltage as 6.22v and enters emergency shutdown.  The DS1780 on it isn't shorted so I'm assuming it's true.  It also then rained caches errors!

Errors on second PIMM that has a 5V voltage issue (test if diff):
Code:
IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
Running in DDR mode
Testing/Initializing memory ...............        DONE
Copying PROM code to memory ...............        DONE



SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa8000000000037f0 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 70  5555555555555555 0000000000000000  155
   Received     : 70  5555557555555555 0000000000000000  155
   Syndrome     : 70  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003ab0 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 30  cccccccccccccccc 0000000000000000  0cc
   Received     : 30  ccccdceccccccccc 0000000000000000  0cc
   Syndrome     : 30  0000102000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<108>
       Asterix R14K   CPU  C3D1 [Pin AC27]   SRAM C8E6  [Pin K2]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003790 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 10  0f0f0f0f0f0f0f0f 0000000000000000  30f
   Received     : 10  0f0f0f0f4f0f0f0f 0000000000000000  30f
   Syndrome     : 10  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base Address: DATA
   Failure      : Brother Double Word Not Zero
   Address      : 0x0000000000000008 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  0000000000000000 5555555555555555  155
   Received     : 00  0000000040000000 5555555555555555  155
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  eaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  4000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]
     SCData<126>
       Asterix R14K   CPU  C3D1 [Pin AK30]   SRAM C8E6  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000002000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  aaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000001000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  aaaaaaaaaaaaaaaa 0000000000000000  2aa
   Received     : 00  eaaaaaaaeaaaaaaa 0000000000000000  2aa
   Syndrome     : 00  4000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]
     SCData<126>
       Asterix R14K   CPU  C3D1 [Pin AK30]   SRAM C8E6  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base + Walk/Inv: DATA
   Failure      : Brother Double Word Not Zero
   Address      : 0x0000000000000009 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  0000000000000000 0000000000000000  000
   Received     : 00  0000000040000000 0000000000000000  000
   Syndrome     : 00  0000000040000000 0000000000000000  000

   Failing Bits
     SCData<94>
       Asterix R14K   CPU  C3D1 [Pin  AK1]   SRAM C8B7  [Pin D1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Short March: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000100 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  555555d555555555 0000000000000000  155
   Syndrome     : 00  0000008000000000 0000000000000000  000

   Failing Bits
     SCData<103>
       Asterix R14K   CPU  C3D1 [Pin AD30]   SRAM C8E6  [Pin H3]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Full March: DATA
   Failure      : ECC Miscompare
   Address      : 0xa800000000003810 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 10  5555555555555555 0000000000000000  155
   Received     : 10  55555d7555555555 0000000000000000  155
   Syndrome     : 10  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000002000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  5555557555555555 0000000000000000  155
   Syndrome     : 00  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000001000 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000800 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555df555555555 0000000000000000  155
   Syndrome     : 00  000008a000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<103>
       Asterix R14K   CPU  C3D1 [Pin AD30]   SRAM C8E6  [Pin H3]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Walking Address: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000400 (Way 0)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCData<107>
       Asterix R14K   CPU  C3D1 [Pin AA27]   SRAM C8E6  [Pin H1]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Base + Walk/Inv: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000001 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  5555557555555555 0000000000000000  155
   Syndrome     : 00  0000002000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]





SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
   Subtest      : Short March: DATA
   Failure      : Data Miscompare
   Address      : 0x0000000000000001 (Way 1)

                  Off ---------     Data     ----------  ECC
   Expected     : 00  5555555555555555 0000000000000000  155
   Received     : 00  55555d7555555555 0000000000000000  155
   Syndrome     : 00  0000082000000000 0000000000000000  000

   Failing Bits
     SCData<101>
       Asterix R14K   CPU  C3D1 [Pin AG33]   SRAM C8E6  [Pin H8]
     SCDat
 
L1 log with diff PIMM tha tha voltage issue:
Code:
SGI SN1 L1 Controller

Firmware Image B: Rev. 1.28.3, Built 03/20/2004 00:01:57





001a01-L1>

001a01-L1>

001a01 ATTN: brick auto power down in 25 seconds



001a01 ATTN: brick auto power down in 20 seconds



001a01-L1>env

Environmental monitoring is enabled and running.



Description    State       Warning Limits     Fault Limits       Current

-------------- ----------  -----------------  -----------------  -------

           12V    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   11.94

        12V IO    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.00

            5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07

          3.3V    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.35

          2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.47

          1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.47

        5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.02

      3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.30

PIMM0 12V bias      Fault  10%  10.80/ 13.20  20%   9.60/ 14.40    9.31

     Fuel SRAM    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.52

      Fuel CPU    Enabled  10%   1.44/  1.76  20%   1.28/  1.92    1.61

    PIMM0 1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.49

PIMM0 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.29

  PIMM0 5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    6.63

  XIO 12V bias    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   11.88

        XIO 5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07

      XIO 2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.47

  XIO 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.30



Description    State       Warning RPM  Current RPM

-------------- ----------  -----------  -----------

FAN 0  EXHAUST    Enabled          920         1188

FAN 1       HD    Enabled         1560         2191

FAN 2      PCI    Enabled         1120         1548

FAN 3    XIO 1    Enabled         1600         2177

FAN 4    XIO 2    Enabled         1600         2045

FAN 5       PS    Enabled         1349         2109



                              Advisory   Critical   Fault      Current

Description       State       Temp       Temp       Temp       Temp      

----------------- ----------  ---------  ---------  ---------  --------- 

0 NODE 0            Enabled    [Autofan Control]    75C/167F   17C/ 62F

1 NODE 1            Enabled    [Autofan Control]    75C/167F   17C/ 62F

2 NODE 2            Enabled    [Autofan Control]    75C/167F   16C/ 60F

3 PIMM              Enabled    [Autofan Control]    75C/167F   18C/ 64F

4 ODYSSEY           Enabled    [Autofan Control]    75C/167F   18C/ 64F

5 BEDROCK           Enabled    [Autofan Control]    85C/185F   16C/ 60F



001a01-L1>

001a01 ATTN: brick auto power down in 15 seconds



001a01 ATTN: brick auto power down in 10 seconds



001a01-L1>env

Environmental monitoring is enabled and running.



Description    State       Warning Limits     Fault Limits       Current

-------------- ----------  -----------------  -----------------  -------

           12V    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   11.94

        12V IO    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.00

            5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07

          3.3V    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.35

          2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.47

          1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.47

        5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.02

      3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.29

PIMM0 12V bias      Fault  10%  10.80/ 13.20  20%   9.60/ 14.40    9.38

     Fuel SRAM    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.52

      Fuel CPU    Enabled  10%   1.44/  1.76  20%   1.28/  1.92    1.61

    PIMM0 1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.49

PIMM0 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.27

  PIMM0 5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    6.63

  XIO 12V bias    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   11.88

        XIO 5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07

      XIO 2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.47

  XIO 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.30



Description    State       Warning RPM  Current RPM

-------------- ----------  -----------  -----------

FAN 0  EXHAUST    Enabled          920         1188

FAN 1       HD    Enabled         1560         2191

FAN 2      PCI    Enabled         1120         1534

FAN 3    XIO 1    Enabled         1600         2177

FAN 4    XIO 2    Enabled         1600         2045

FAN 5       PS    Enabled         1349         2109



                              Advisory   Critical   Fault      Current

Description       State       Temp       Temp       Temp       Temp      

----------------- ----------  ---------  ---------  ---------  --------- 

0 NODE 0            Enabled    [Autofan Control]    75C/167F   18C/ 64F

1 NODE 1            Enabled    [Autofan Control]    75C/167F   17C/ 62F

2 NODE 2            Enabled    [Autofan Control]    75C/167F   16C/ 60F

3 PIMM              Enabled    [Autofan Control]    75C/167F   19C/ 66F

4 ODYSSEY           Enabled    [Autofan Control]    75C/167F   18C/ 64F

5 BEDROCK           Enabled    [Autofan Control]    85C/185F   17C/ 62F



001a01-L1>

001a01 ATTN: brick auto power down in 5 seconds



001a01 ATTN: brick is powering down now!

So because of secondary cache failure I didn't get a chance to issue a error_dumpt to see if the old error went away or not.  So still unknown...but I kind of doubt it. Also, I'm sometimes seeing shorts on this "broken" PIMMs caps so there is something wrong..it's possible the cache error is due to voltage and not really bad cache...but who knows and it likely is a dead end to prove anything right now.

Also I sometimes get the SAME TLB error as this other post (happened only once, likely not related): https://forums.irixnet.org/thread-3194-p...l#pid22723

Console from when I PUT BACK the PIMM (700Mhz) I got with it:

Code:
IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
Running in DDR mode
Testing/Initializing memory ...............        DONE
Copying PROM code to memory ...............        DONE
Discovering local IO ......................        DONE
Discovering NUMAlink connectivity .........        
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30360 usec
Waiting for peers to complete discovery....        DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
\\\\Intializing any CPUless nodes..............        \\DONE
\Checking partitioning information .........        DONE
No other nodes present; becoming partition master
\Loading BASEIO prom .......................        DONE

BASEIO PROM Monitor SGI Version 6.210  built 02:30:38 PM Aug 26, 2004 (BE64)
1 CPUs on 1 nodes found.

NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled

PS/2 Keyboard & Mouse diagnostics
    No keyboard found, skipping keyboard test
    No mouse found, skipping mouse test

    Missing PS/2 device(s) AND console set to "g"
PS/2 Keyboard & Mouse diagnostics passed with a possible problem

Graphics diagnostics

Odyssey board #0 found on nasid 0
Running Odyssey xtalk sanity diag...
        Board version 1 - Buzz revision 2B
        On board sdram size: 32 Mb
        Cas latency: CAS 3
        2 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics

Installing PROM Device drivers ............            
Base I/O Ethernet set to /dev/ethernet/ef0
Installing Graphics Console...
graphics install: searching for pipe 0

Walking SCSI Adapter 0, (pci id 1)
1timeout on adapter 0 target 1
   tm0=0xffffc4d34a373a5d, tm1=0xfffed7de39b49c50, timeout=0xb
- 2+ Device Vendor Product:
3+ Device Vendor Product:
4+ Device Vendor Product:
5+ Device Vendor Product:
6+ Device Vendor Product:
7+ Device Vendor Product:
8+ Device Vendor Product:
9+ Device Vendor Product:
10+ Device Vendor Product:
11timeout on adapter 0 target b
   tm0=0xfffed7de39b49c5b, tm1=0xfffed7de39b49c34, timeout=0x3
+ Device Vendor Product:
12+ Device Vendor Product:
13
A 000: *** TLB Refill Exception on node 0
A 000: *** EPC: 0xc00000001fc47e58 (0xc00000001fc47e58)
A 000: *** Press ENTER to continue.
A 000: POD IOC3 Dex> log
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 2 was previously Present & Enabled but is now Present & Disabled
A Info : Memory bank 2 previously had 256 MB but now has 512 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 3 was previously Present & Enabled but is now Absent
A Info : Memory bank 3 previously had 256 MB but now has 0 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A 000: POD IOC3 Dex> error_dump
Hardware Error State: (Forced error dump)
+  Errors on node Nasid 0x0 (0)
+    XBow in /hw/module/001c01
+      BEDROCK signalled following errors.
+        XBow Link a status register: 0xffffffff80020000
+          17: Illegal destination
+        XBow error command word register: 0xffffffffaa028000
+        XBow error upper address register: 0x0
+        XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Dex>

IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
  built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f41
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 9
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02ad0 (0xc00000001fc02ad0)
Configuring memory
Local memory configured: 512 MB (standard)
*** Warning: System controller debug switches are non-zero (0x2d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
*** Ignoring env. vars/using defaults
before reading NICHub NIC: 0x5455827f
SR1 set to 0x0000080690349000
SR0 set to 0x000000005455827f
Testing/Initializing memory ...............        DONE
Copying PROM code to memory ...............        Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x1686a8
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ......................        Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........        
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30360 usec
Waiting for peers to complete discovery....        Discovery results:
ENTRY 0: HUB(5455827f)
    NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x5455827f, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315) \Global barrier passed.
Global barrier (line 4348) \Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(5455827f)
    NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(5455827f)
    NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823) \Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928) \Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition:  +0
Regions in partition:  +0
Intializing any CPUless nodes..............        Global barrier (line 7714) \Global barrier passed.
Global barrier (line 7715) \Global barrier passed.
DONE
Global barrier (line 5089) \Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information .........        DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(5455827f)
    NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: FE
Erecting partition fences ................                        DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300) \Global barrier passed.
Nasids in partition:  +0
Regions in partition:  +0

A 000: *** Entering POD mode on node 0
A 000: POD IOC3 Cac> error_dump
Hardware Error State: (Forced error dump)
+  Errors on node Nasid 0x0 (0)
+    XBow in /hw/module/174562
+      BEDROCK signalled following errors.
+        XBow Link a status register: 0xffffffff80020000
+          17: Illegal destination
+        XBow error command word register: 0xffffffffaa000000
+        XBow error upper address register: 0x0
+        XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Cac> tst
                     ^ Syntax error
A 000: POD IOC3 Cac>

IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
  built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f41
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 9
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02ac0 (0xc00000001fc02ac0)
Configuring memory
Local memory configured: 512 MB (standard)
*** Warning: System controller debug switches are non-zero (0x2d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
*** Ignoring env. vars/using defaults
before reading NICHub NIC: 0x5455827f
SR1 set to 0x0000080690349000
SR0 set to 0x000000005455827f
Testing/Initializing memory ...............        DONE
Copying PROM code to memory ...............        Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x1686a8
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ......................        Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........        
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30359 usec
Waiting for peers to complete discovery....        Discovery results:
ENTRY 0: HUB(5455827f)
    NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x5455827f, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315) \Global barrier passed.
Global barrier (line 4348) \Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(5455827f)
    NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(5455827f)
    NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823) \Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928) \Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition:  +0
Regions in partition:  +0
Intializing any CPUless nodes..............        Global barrier (line 7714) \Global barrier passed.
Global barrier (line 7715) \Global barrier passed.
DONE
Global barrier (line 5089) \Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information .........        DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(5455827f)
    NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
    MODULE=001c01 PARTITION=0 SPACE=RESET
    Port 1 connection: Not connected
    Port status: FE
Erecting partition fences ................                        DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300) \Global barrier passed.
Nasids in partition:  +0
Regions in partition:  +0

A 000: *** Entering POD mode on node 0
A 000: POD IOC3 Cac> log
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 2 was previously Present & Enabled but is now Present & Disabled
A Info : Memory bank 2 previously had 256 MB but now has 512 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 3 was previously Present & Enabled but is now Absent
A Info : Memory bank 3 previously had 256 MB but now has 0 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A 000: POD IOC3 Cac> error_dump
Hardware Error State: (Forced error dump)
+  Errors on node Nasid 0x0 (0)
+    XBow in /hw/module/174562
+      BEDROCK signalled following errors.
+        XBow Link a status register: 0xffffffff80020000
+          17: Illegal destination
+        XBow error command word register: 0xffffffffaa020000
+        XBow error upper address register: 0x0
+        XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Cac>

So I guess without another mainboard or PIMM I don't know.  This is also the LAST mainboard revision...hence the most expensive...geez.


UPDATE:  Okay still looking and grasping but I did find this: http://www.mit.edu/afs.new/sipb/project/...s/Makefile

It has an SGI-IP27 (Origin200/2000) section and it specifically calls out the address in my error, it says it belongs to a CONFIG_TOSHIBA_RBTX4927 section.  The files includes go to something like this:https://android.googlesource.com/platform/hardware/bsp/kernel/rockchip/rk-v4.4/+/refs/heads/master/arch/mips/Makefile.  Whick I assume is still a MIPS section of the linux kernel and gives me the impression that the BEDROCK (rockchip) may be this Toshiba MIPS processor?

UPDATE 2:  Register for SDRAM CHANNEL CONTROL?  http://www.elektronikjk.pl/elementy_czyn...R4927A.pdf

SDRAM chips can be resoldered (hand reflowed) on this board...could this be a SDRAM issue?  I'm having trouble reading the address, cold also be an ECC Status Register (ECCSR)?
(This post was last modified: 11-13-2021, 01:13 AM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
11-12-2021, 11:08 PM


Messages In This Thread
The start of a LONG Fuel repair thread... - by weblacky - 10-22-2021, 05:26 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 10-22-2021, 06:21 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-22-2021, 06:34 AM
RE: The start of a LONG Fuel repair thread... - by indigofan - 10-22-2021, 01:34 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 10-22-2021, 01:41 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-22-2021, 03:57 PM
RE: The start of a LONG Fuel repair thread... - by Shiunbird - 10-22-2021, 08:27 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-29-2021, 03:05 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-29-2021, 08:42 PM
RE: The start of a LONG Fuel repair thread... - by jwhat - 10-29-2021, 10:15 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 10-29-2021, 10:36 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-29-2021, 10:46 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-31-2021, 05:28 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 10-31-2021, 05:36 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 10-31-2021, 05:55 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-01-2021, 03:23 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-01-2021, 03:50 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-01-2021, 04:05 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-01-2021, 04:52 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-03-2021, 01:27 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-03-2021, 04:54 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-03-2021, 05:02 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-03-2021, 06:28 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-03-2021, 11:36 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-03-2021, 11:54 PM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-04-2021, 01:21 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-04-2021, 01:59 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-04-2021, 04:30 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-04-2021, 05:32 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-04-2021, 06:23 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-05-2021, 09:47 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-06-2021, 02:17 AM
RE: The start of a LONG Fuel repair thread... - by vvuk - 11-07-2021, 07:49 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-07-2021, 02:45 PM
RE: The start of a LONG Fuel repair thread... - by vvuk - 11-07-2021, 05:00 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-07-2021, 03:30 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-07-2021, 06:46 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-07-2021, 08:11 PM
RE: The start of a LONG Fuel repair thread... - by vvuk - 11-07-2021, 08:35 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-07-2021, 10:43 PM
RE: The start of a LONG Fuel repair thread... - by vvuk - 11-08-2021, 08:51 PM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-07-2021, 08:59 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-09-2021, 12:34 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-12-2021, 10:50 PM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-12-2021, 11:04 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-12-2021, 11:08 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-13-2021, 02:18 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-13-2021, 02:34 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-13-2021, 02:52 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-13-2021, 03:49 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-13-2021, 04:02 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-13-2021, 05:31 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-13-2021, 05:34 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-13-2021, 06:27 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-17-2021, 06:06 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 11-17-2021, 07:28 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-17-2021, 07:51 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-17-2021, 12:31 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-17-2021, 12:43 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-20-2021, 01:37 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-20-2021, 05:30 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-20-2021, 06:35 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-21-2021, 01:29 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-21-2021, 05:46 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-21-2021, 06:13 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-21-2021, 06:42 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-21-2021, 06:54 AM
RE: The start of a LONG Fuel repair thread... - by robespierre - 11-21-2021, 08:04 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-21-2021, 08:39 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-21-2021, 10:03 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-21-2021, 11:55 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 11-21-2021, 11:16 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 11-27-2021, 03:12 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 12-02-2021, 12:43 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 12-07-2021, 04:47 AM
RE: The start of a LONG Fuel repair thread... - by Raion - 12-07-2021, 06:49 AM
RE: The start of a LONG Fuel repair thread... - by jwhat - 12-07-2021, 07:40 AM
RE: The start of a LONG Fuel repair thread... - by jan-jaap - 12-07-2021, 02:46 PM
RE: The start of a LONG Fuel repair thread... - by indigofan - 12-07-2021, 04:34 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 12-07-2021, 04:56 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 12-16-2021, 12:13 AM
RE: The start of a LONG Fuel repair thread... - by jan-jaap - 12-16-2021, 08:38 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 12-16-2021, 09:25 AM
RE: The start of a LONG Fuel repair thread... - by weblacky - 01-02-2022, 01:08 AM
RE: The start of a LONG Fuel repair thread... - by megaimg - 01-10-2022, 03:58 AM
RE: The start of a LONG Fuel repair thread... - by jan-jaap - 01-02-2022, 08:15 PM
RE: The start of a LONG Fuel repair thread... - by weblacky - 01-02-2022, 08:48 PM

Forum Jump:


Users browsing this thread: 3 Guest(s)