(11-12-2021, 11:04 PM)jwhat Wrote: Hi Weblacky,
really long post....
I have to take kids to the museum as looking at my old SGI boxes is not a exciting as DINOSAURS!! ;-)
I will review on return.
Cheers from Oz,
jwhat/John.
Thanks, it's as long as it is demoralizing :-) Also, you could compromise and take the kids to see SGIs at a computer museum?
Also, assuming this HW graph node numbering ISN'T random. Can some use something like this find to find out WHAT HW this number is:
https://nixdoc.net/man-pages/IRIX/man1/xbstat.1.html
Okay...sh*t storm 2: the legend continues....
Just fir yucks I tried the PIMM no guri gave me that's suppose to have a voltage error...well it does still have the voltage error, it causes/reports the 5v PIMM AUX voltage as 6.22v and enters emergency shutdown. The DS1780 on it isn't shorted so I'm assuming it's true. It also then rained caches errors!
Errors on second PIMM that has a 5V voltage issue (test if diff):
Code:
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
Running in DDR mode
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Full March: DATA
Failure : ECC Miscompare
Address : 0xa8000000000037f0 (Way 0)
Off --------- Data ---------- ECC
Expected : 70 5555555555555555 0000000000000000 155
Received : 70 5555557555555555 0000000000000000 155
Syndrome : 70 0000002000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Full March: DATA
Failure : ECC Miscompare
Address : 0xa800000000003ab0 (Way 0)
Off --------- Data ---------- ECC
Expected : 30 cccccccccccccccc 0000000000000000 0cc
Received : 30 ccccdceccccccccc 0000000000000000 0cc
Syndrome : 30 0000102000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCData<108>
Asterix R14K CPU C3D1 [Pin AC27] SRAM C8E6 [Pin K2]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Full March: DATA
Failure : ECC Miscompare
Address : 0xa800000000003790 (Way 0)
Off --------- Data ---------- ECC
Expected : 10 0f0f0f0f0f0f0f0f 0000000000000000 30f
Received : 10 0f0f0f0f4f0f0f0f 0000000000000000 30f
Syndrome : 10 0000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Base Address: DATA
Failure : Brother Double Word Not Zero
Address : 0x0000000000000008 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 0000000000000000 5555555555555555 155
Received : 00 0000000040000000 5555555555555555 155
Syndrome : 00 0000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Base Address: DATA
Failure : Data Miscompare
Address : 0x0000000000000000 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 aaaaaaaaaaaaaaaa 0000000000000000 2aa
Received : 00 eaaaaaaaeaaaaaaa 0000000000000000 2aa
Syndrome : 00 4000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SCData<126>
Asterix R14K CPU C3D1 [Pin AK30] SRAM C8E6 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000002000 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 aaaaaaaaaaaaaaaa 0000000000000000 2aa
Received : 00 aaaaaaaaeaaaaaaa 0000000000000000 2aa
Syndrome : 00 0000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000001000 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 aaaaaaaaaaaaaaaa 0000000000000000 2aa
Received : 00 eaaaaaaaeaaaaaaa 0000000000000000 2aa
Syndrome : 00 4000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SCData<126>
Asterix R14K CPU C3D1 [Pin AK30] SRAM C8E6 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Base + Walk/Inv: DATA
Failure : Brother Double Word Not Zero
Address : 0x0000000000000009 (Way 1)
Off --------- Data ---------- ECC
Expected : 00 0000000000000000 0000000000000000 000
Received : 00 0000000040000000 0000000000000000 000
Syndrome : 00 0000000040000000 0000000000000000 000
Failing Bits
SCData<94>
Asterix R14K CPU C3D1 [Pin AK1] SRAM C8B7 [Pin D1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Short March: DATA
Failure : Data Miscompare
Address : 0x0000000000000100 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 555555d555555555 0000000000000000 155
Syndrome : 00 0000008000000000 0000000000000000 000
Failing Bits
SCData<103>
Asterix R14K CPU C3D1 [Pin AD30] SRAM C8E6 [Pin H3]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Full March: DATA
Failure : ECC Miscompare
Address : 0xa800000000003810 (Way 0)
Off --------- Data ---------- ECC
Expected : 10 5555555555555555 0000000000000000 155
Received : 10 55555d7555555555 0000000000000000 155
Syndrome : 10 0000082000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCData<107>
Asterix R14K CPU C3D1 [Pin AA27] SRAM C8E6 [Pin H1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000002000 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 5555557555555555 0000000000000000 155
Syndrome : 00 0000002000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000001000 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 55555d7555555555 0000000000000000 155
Syndrome : 00 0000082000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCData<107>
Asterix R14K CPU C3D1 [Pin AA27] SRAM C8E6 [Pin H1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000000800 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 55555df555555555 0000000000000000 155
Syndrome : 00 000008a000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCData<103>
Asterix R14K CPU C3D1 [Pin AD30] SRAM C8E6 [Pin H3]
SCData<107>
Asterix R14K CPU C3D1 [Pin AA27] SRAM C8E6 [Pin H1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Walking Address: DATA
Failure : Data Miscompare
Address : 0x0000000000000400 (Way 0)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 55555d7555555555 0000000000000000 155
Syndrome : 00 0000082000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCData<107>
Asterix R14K CPU C3D1 [Pin AA27] SRAM C8E6 [Pin H1]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Base + Walk/Inv: DATA
Failure : Data Miscompare
Address : 0x0000000000000001 (Way 1)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 5555557555555555 0000000000000000 155
Syndrome : 00 0000002000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SECONDARY CACHE DATA FAILURE: Module 001c01 CPU A
Subtest : Short March: DATA
Failure : Data Miscompare
Address : 0x0000000000000001 (Way 1)
Off --------- Data ---------- ECC
Expected : 00 5555555555555555 0000000000000000 155
Received : 00 55555d7555555555 0000000000000000 155
Syndrome : 00 0000082000000000 0000000000000000 000
Failing Bits
SCData<101>
Asterix R14K CPU C3D1 [Pin AG33] SRAM C8E6 [Pin H8]
SCDat
L1 log with diff PIMM tha tha voltage issue:
Code:
SGI SN1 L1 Controller
Firmware Image B: Rev. 1.28.3, Built 03/20/2004 00:01:57
001a01-L1>
001a01-L1>
001a01 ATTN: brick auto power down in 25 seconds
001a01 ATTN: brick auto power down in 20 seconds
001a01-L1>env
Environmental monitoring is enabled and running.
Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
12V Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 11.94
12V IO Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.00
5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.07
3.3V Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.35
2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.47
1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.47
5V aux Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.02
3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.30
PIMM0 12V bias Fault 10% 10.80/ 13.20 20% 9.60/ 14.40 9.31
Fuel SRAM Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.52
Fuel CPU Enabled 10% 1.44/ 1.76 20% 1.28/ 1.92 1.61
PIMM0 1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.49
PIMM0 3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.29
PIMM0 5V aux Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 6.63
XIO 12V bias Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 11.88
XIO 5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.07
XIO 2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.47
XIO 3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.30
Description State Warning RPM Current RPM
-------------- ---------- ----------- -----------
FAN 0 EXHAUST Enabled 920 1188
FAN 1 HD Enabled 1560 2191
FAN 2 PCI Enabled 1120 1548
FAN 3 XIO 1 Enabled 1600 2177
FAN 4 XIO 2 Enabled 1600 2045
FAN 5 PS Enabled 1349 2109
Advisory Critical Fault Current
Description State Temp Temp Temp Temp
----------------- ---------- --------- --------- --------- ---------
0 NODE 0 Enabled [Autofan Control] 75C/167F 17C/ 62F
1 NODE 1 Enabled [Autofan Control] 75C/167F 17C/ 62F
2 NODE 2 Enabled [Autofan Control] 75C/167F 16C/ 60F
3 PIMM Enabled [Autofan Control] 75C/167F 18C/ 64F
4 ODYSSEY Enabled [Autofan Control] 75C/167F 18C/ 64F
5 BEDROCK Enabled [Autofan Control] 85C/185F 16C/ 60F
001a01-L1>
001a01 ATTN: brick auto power down in 15 seconds
001a01 ATTN: brick auto power down in 10 seconds
001a01-L1>env
Environmental monitoring is enabled and running.
Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
12V Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 11.94
12V IO Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 12.00
5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.07
3.3V Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.35
2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.47
1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.47
5V aux Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.02
3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.29
PIMM0 12V bias Fault 10% 10.80/ 13.20 20% 9.60/ 14.40 9.38
Fuel SRAM Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.52
Fuel CPU Enabled 10% 1.44/ 1.76 20% 1.28/ 1.92 1.61
PIMM0 1.5V Enabled 10% 1.35/ 1.65 20% 1.20/ 1.80 1.49
PIMM0 3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.27
PIMM0 5V aux Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 6.63
XIO 12V bias Enabled 10% 10.80/ 13.20 20% 9.60/ 14.40 11.88
XIO 5V Enabled 10% 4.50/ 5.50 20% 4.00/ 6.00 5.07
XIO 2.5V Enabled 10% 2.25/ 2.75 20% 2.00/ 3.00 2.47
XIO 3.3V aux Enabled 10% 2.97/ 3.63 20% 2.64/ 3.96 3.30
Description State Warning RPM Current RPM
-------------- ---------- ----------- -----------
FAN 0 EXHAUST Enabled 920 1188
FAN 1 HD Enabled 1560 2191
FAN 2 PCI Enabled 1120 1534
FAN 3 XIO 1 Enabled 1600 2177
FAN 4 XIO 2 Enabled 1600 2045
FAN 5 PS Enabled 1349 2109
Advisory Critical Fault Current
Description State Temp Temp Temp Temp
----------------- ---------- --------- --------- --------- ---------
0 NODE 0 Enabled [Autofan Control] 75C/167F 18C/ 64F
1 NODE 1 Enabled [Autofan Control] 75C/167F 17C/ 62F
2 NODE 2 Enabled [Autofan Control] 75C/167F 16C/ 60F
3 PIMM Enabled [Autofan Control] 75C/167F 19C/ 66F
4 ODYSSEY Enabled [Autofan Control] 75C/167F 18C/ 64F
5 BEDROCK Enabled [Autofan Control] 85C/185F 17C/ 62F
001a01-L1>
001a01 ATTN: brick auto power down in 5 seconds
001a01 ATTN: brick is powering down now!
So because of secondary cache failure I didn't get a chance to issue a error_dumpt to see if the old error went away or not. So still unknown...but I kind of doubt it. Also, I'm sometimes seeing shorts on this "broken" PIMMs caps so there is something wrong..it's possible the cache error is due to voltage and not really bad cache...but who knows and it likely is a dead end to prove anything right now.
Also I sometimes get the SAME TLB error as this other post (happened only once, likely not related):
https://forums.irixnet.org/thread-3194-p...l#pid22723
Console from when I PUT BACK the PIMM (700Mhz) I got with it:
Code:
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
Running in DDR mode
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering local IO ...................... DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30360 usec
Waiting for peers to complete discovery.... DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
\\\\Intializing any CPUless nodes.............. \\DONE
\Checking partitioning information ......... DONE
No other nodes present; becoming partition master
\Loading BASEIO prom ....................... DONE
BASEIO PROM Monitor SGI Version 6.210 built 02:30:38 PM Aug 26, 2004 (BE64)
1 CPUs on 1 nodes found.
NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled
PS/2 Keyboard & Mouse diagnostics
No keyboard found, skipping keyboard test
No mouse found, skipping mouse test
Missing PS/2 device(s) AND console set to "g"
PS/2 Keyboard & Mouse diagnostics passed with a possible problem
Graphics diagnostics
Odyssey board #0 found on nasid 0
Running Odyssey xtalk sanity diag...
Board version 1 - Buzz revision 2B
On board sdram size: 32 Mb
Cas latency: CAS 3
2 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics
Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0
Installing Graphics Console...
graphics install: searching for pipe 0
Walking SCSI Adapter 0, (pci id 1)
1timeout on adapter 0 target 1
tm0=0xffffc4d34a373a5d, tm1=0xfffed7de39b49c50, timeout=0xb
- 2+ Device Vendor Product:
3+ Device Vendor Product:
4+ Device Vendor Product:
5+ Device Vendor Product:
6+ Device Vendor Product:
7+ Device Vendor Product:
8+ Device Vendor Product:
9+ Device Vendor Product:
10+ Device Vendor Product:
11timeout on adapter 0 target b
tm0=0xfffed7de39b49c5b, tm1=0xfffed7de39b49c34, timeout=0x3
+ Device Vendor Product:
12+ Device Vendor Product:
13
A 000: *** TLB Refill Exception on node 0
A 000: *** EPC: 0xc00000001fc47e58 (0xc00000001fc47e58)
A 000: *** Press ENTER to continue.
A 000: POD IOC3 Dex> log
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 2 was previously Present & Enabled but is now Present & Disabled
A Info : Memory bank 2 previously had 256 MB but now has 512 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 3 was previously Present & Enabled but is now Absent
A Info : Memory bank 3 previously had 256 MB but now has 0 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A 000: POD IOC3 Dex> error_dump
Hardware Error State: (Forced error dump)
+ Errors on node Nasid 0x0 (0)
+ XBow in /hw/module/001c01
+ BEDROCK signalled following errors.
+ XBow Link a status register: 0xffffffff80020000
+ 17: Illegal destination
+ XBow error command word register: 0xffffffffaa028000
+ XBow error upper address register: 0x0
+ XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Dex>
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f41
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 9
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02ad0 (0xc00000001fc02ad0)
Configuring memory
Local memory configured: 512 MB (standard)
*** Warning: System controller debug switches are non-zero (0x2d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
*** Ignoring env. vars/using defaults
before reading NICHub NIC: 0x5455827f
SR1 set to 0x0000080690349000
SR0 set to 0x000000005455827f
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x1686a8
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30360 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(5455827f)
NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x5455827f, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315) \Global barrier passed.
Global barrier (line 4348) \Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(5455827f)
NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(5455827f)
NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823) \Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928) \Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line 7714) \Global barrier passed.
Global barrier (line 7715) \Global barrier passed.
DONE
Global barrier (line 5089) \Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(5455827f)
NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300) \Global barrier passed.
Nasids in partition: +0
Regions in partition: +0
A 000: *** Entering POD mode on node 0
A 000: POD IOC3 Cac> error_dump
Hardware Error State: (Forced error dump)
+ Errors on node Nasid 0x0 (0)
+ XBow in /hw/module/174562
+ BEDROCK signalled following errors.
+ XBow Link a status register: 0xffffffff80020000
+ 17: Illegal destination
+ XBow error command word register: 0xffffffffaa000000
+ XBow error upper address register: 0x0
+ XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Cac> tst
^ Syntax error
A 000: POD IOC3 Cac>
IP35 PROM SGI Version 6.210 built 02:33:51 PM Aug 26, 2004
built for bedrock rev. 1.1 or greater
running in IP34 mode
Running in DDR mode
Local master CPU A revision: f41
PROM length: 0x1686a8, BSS length: 0xa7a0, flash count: 9
Configured bedrock clock: 200.0 MHz
Status of local IO: 0x1 0x3fc0fff6403
Bedrock Rev: 2, Module: 1 (001c01) from Sys Ctlr
On PROM entry: ERR_EPC=0xc00000001fc02ac0 (0xc00000001fc02ac0)
Configuring memory
Local memory configured: 512 MB (standard)
*** Warning: System controller debug switches are non-zero (0x2d)
*** Diag level set to None (2)
*** Info level set to verbose
*** Boot stop requested at Global (2)
*** Ignoring env. vars/using defaults
before reading NICHub NIC: 0x5455827f
SR1 set to 0x0000080690349000
SR0 set to 0x000000005455827f
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... Copy PROM (0x9000000018000000) to RAM (0x9600000001a00000), len 0x1686a8
Done
DONE
Skipping secondary cache diags
CPU A switching stack into UALIAS and invalidating D-cache
CPU A switching into node 0 cached RAM
CPU A running cached
Initializing kldir.
Done initializing kldir.
Initializing klconfig.
init_klcfg: nasid 0 start 9600000000030000 size 10000
Done initializing klconfig.
Discovering local IO ...................... Check_master: link 10 is master
Check_master: link 10 is master
DONE
CPU A initialized subnode
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 30359 usec
Waiting for peers to complete discovery.... Discovery results:
ENTRY 0: HUB(5455827f)
NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
DONE
No other nodes present; becoming global master
Global master is entry 0, NIC 0x5455827f, /hw/rack/001/bay/01
Global master is /hw/rack/001/bay/01
Global barrier (line 4315) \Global barrier passed.
Global barrier (line 4348) \Global barrier passed.
Master System Topology Graph (pre-nasid_assign):
ENTRY 0: HUB(5455827f)
NASID=-1 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Calculating NASIDs
num_routers is 0
Master System Topology Graph:
ENTRY 0: HUB(5455827f)
NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: NF
Distributing routing tables
Distributing NASIDs
*** NASID assigned to 0
CPU A switching to UALIAS
CPU A running in UALIAS
Changing node ID to 0
Global barrier (line 4823) \Global barrier passed.
CPU A Flushing and invalidating caches
Global barrier (line 4928) \Global barrier passed.
CPU A switching to node 0 cached RAM
CPU A running cached
Nasids in partition: +0
Regions in partition: +0
Intializing any CPUless nodes.............. Global barrier (line 7714) \Global barrier passed.
Global barrier (line 7715) \Global barrier passed.
DONE
Global barrier (line 5089) \Global barrier passed.
hubii_link_good: A-brick attached to module 001c01.
Checking partitioning information ......... DONE
No other nodes present; becoming partition master
*** After partitioning ***
ENTRY 0: HUB(5455827f)
NASID=0 Mod=1 Flg=0x1500000 PROM=6.210 Route=N/A
MODULE=001c01 PARTITION=0 SPACE=RESET
Port 1 connection: Not connected
Port status: FE
Erecting partition fences ................ DONE
Update config for routers connected to hubs
Update config for hubs and hubless routers
CPU A flushing cache
check_router_cfg: nasid 0 is_voyager 0 check_cfg = 0
Global barrier (line 5300) \Global barrier passed.
Nasids in partition: +0
Regions in partition: +0
A 000: *** Entering POD mode on node 0
A 000: POD IOC3 Cac> log
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 2 was previously Present & Enabled but is now Present & Disabled
A Info : Memory bank 2 previously had 256 MB but now has 512 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : Memory bank 3 was previously Present & Enabled but is now Absent
A Info : Memory bank 3 previously had 256 MB but now has 0 MB
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A Info : SYS-DEGRADED: 512 MB Memory disabled
A 000: POD IOC3 Cac> error_dump
Hardware Error State: (Forced error dump)
+ Errors on node Nasid 0x0 (0)
+ XBow in /hw/module/174562
+ BEDROCK signalled following errors.
+ XBow Link a status register: 0xffffffff80020000
+ 17: Illegal destination
+ XBow error command word register: 0xffffffffaa020000
+ XBow error upper address register: 0x0
+ XBow error lower address register: 0x0
END Hardware Error State (Forced error dump)
A 000: POD IOC3 Cac>
So I guess without another mainboard or PIMM I don't know. This is also the LAST mainboard revision...hence the most expensive...geez.
UPDATE: Okay still looking and grasping but I did find this:
http://www.mit.edu/afs.new/sipb/project/...s/Makefile
It has an SGI-IP27 (Origin200/2000) section and it specifically calls out the address in my error, it says it belongs to a CONFIG_TOSHIBA_RBTX4927 section. The files includes go to something like this:https://android.googlesource.com/platform/hardware/bsp/kernel/rockchip/rk-v4.4/+/refs/heads/master/arch/mips/Makefile. Whick I assume is still a MIPS section of the linux kernel and gives me the impression that the BEDROCK (rockchip) may be this Toshiba MIPS processor?
UPDATE 2: Register for SDRAM CHANNEL CONTROL?
http://www.elektronikjk.pl/elementy_czyn...R4927A.pdf
SDRAM chips can be resoldered (hand reflowed) on this board...could this be a SDRAM issue? I'm having trouble reading the address, cold also be an ECC Status Register (ECCSR)?