Altix 350 memory issues
Hope this is the right place.
I'm trying to upgrade my Altix 350 to the maximum of 24GB per node (12GB per CPU), unfortunately, I'm getting memory errors during the memory self test. The machine works fine with the mix of 1GB and 512MB PC2700R modules it came with, however I've tried a couple of sets of 2GB modules and neither work. Firstly, I tried some Samsung PC3200R and secondly Micron PC2700R.
I feel like I'm missing something obvious, but I'm at a loss as to what!
I've had to edit to include the log of the failed boot (with Micron modules) as the forum seemed to think it contained contact details.
Code: 001c02-L1>pwr u
returning to console mode 001c02 CPU0, <CTRL_T> to escape to L1
001c02#0c: SGI SAL Version 5.04 rel081111 IP41 built 07:25:21 AM Nov 11, 2008
001c02#0a: SGI SAL Version 5.04 rel081111 IP41 built 07:25:21 AM Nov 11, 2008
Found I/O brick attached to module/001c02/slab/0/node
Probing memory DIMMs ...................... DONE
Initializing memory controller ............ DONE
Testing memory .........................
MBIST FAILURE Byte Address = 0x000000000 (Approximate)
MBIST_REG:X_RP_L, ACT:0x000f0f0f0f, EXP:0x00ffffffff, DIF:0x00f0f0f0f0
MBIST_ERR, FRU:001c02.0 DIMM0_N0_R_BUS_X, ADDR:0x000000000, DIMM_BITS:44,45,46,47,52,53,54,55, ...
MBIST_REG:X_RS_L, ACT:0x0f0f2f9f0f, EXP:0xffffffffff, DIF:0xf0f0d060f0
MBIST_ERR, FRU:001c02.0 DIMM0_N0_R_BUS_X, ADDR:0x000000000, DIMM_BITS:4,5,6,7,13,14,20,22, ...
MBIST_REG:X_LP_L, ACT:0x000f0f0f0f, EXP:0x00ffffffff, DIF:0x00f0f0f0f0
MBIST_ERR, FRU:001c02.0 DIMM0_N0_L_BUS_X, ADDR:0x000000000, DIMM_BITS:44,45,46,47,52,53,54,55, ...
MBIST_REG:X_LS_L, ACT:0x0f0f0f1f1f, EXP:0xffffffffff, DIF:0xf0f0f0e0e0
MBIST_ERR, FRU:001c02.0 DIMM0_N0_L_BUS_X, ADDR:0x000000000, DIMM_BITS:5,6,7,13,14,15,20,21, ...
MBIST_REG:Y_RP_L, ACT:0x000f0f0f0f, EXP:0x00ffffffff, DIF:0x00f0f0f0f0
MBIST_ERR, FRU:001c02.0 DIMM1_N0_R_BUS_Y, ADDR:0x000000000, DIMM_BITS:44,45,46,47,52,53,54,55, ...
MBIST_REG:Y_RS_L, ACT:0x0f0f0f0f0f, EXP:0xffffffffff, DIF:0xf0f0f0f0f0
MBIST_ERR, FRU:001c02.0 DIMM1_N0_R_BUS_Y, ADDR:0x000000000, DIMM_BITS:4,5,6,7,12,13,14,15, ...
MBIST_REG:Y_LP_L, ACT:0x000f4f0f0f, EXP:0x00ffffffff, DIF:0x00f0b0f0f0
MBIST_ERR, FRU:001c02.0 DIMM1_N0_L_BUS_Y, ADDR:0x000000000, DIMM_BITS:44,45,46,47,52,53,54,55, ...
MBIST_REG:Y_LS_L, ACT:0x1f0f8f0f0f, EXP:0xffffffffff, DIF:0xe0f070f0f0
MBIST_ERR, FRU:001c02.0 DIMM1_N0_L_BUS_Y, ADDR:0x000000000, DIMM_BITS:4,5,6,7,12,13,14,15, ...
MULTI-BIT DIFFERENCE DETECTED DURING MEMORY BIST:
Location 001c02#0 DIMM0 N0_L_BUS_X received 2 multi-bit mbist error(s).
Location 001c02#0 DIMM1 N0_L_BUS_Y received 2 multi-bit mbist error(s).
Location 001c02#0 DIMM0 N0_R_BUS_X received 2 multi-bit mbist error(s).
Location 001c02#0 DIMM1 N0_R_BUS_Y received 2 multi-bit mbist error(s).
Module 001c02#0 Bank 0 failed memory tests.
MBIST FAILURE Byte Address = 0x400000000 (Approximate)
MBIST_REG:X_RP_L, ACT:0x000f0f1f8f, EXP:0x00ffffffff, DIF:0x00f0f0e070
MBIST_ERR, FRU:001c02.0 DIMM2_N0_R_BUS_X, ADDR:0x400000000, DIMM_BITS:44,45,46,53,54,55,60,61, ...
MBIST_REG:X_RS_L, ACT:0x8f0f0f2f0f, EXP:0xffffffffff, DIF:0x70f0f0d0f0
MBIST_ERR, FRU:001c02.0 DIMM2_N0_R_BUS_X, ADDR:0x400000000, DIMM_BITS:4,5,6,7,12,14,15,20, ...
MBIST_REG:X_LP_L, ACT:0x000f4f0f0f, EXP:0x00ffffffff, DIF:0x00f0b0f0f0
escaping to L1 system controller
001c02-L1>pwr d
WARNING: power appears off, console unavailable
Code: 001c02-L1>version
L1 1.62.1 (Image A), Built 11/10/2008 10:13:01 [Legacy 2MB image]
WARNING: power appears off, console unavailable
(This post was last modified: 10-20-2020, 03:46 PM by Donjon.)
|
|
Donjon
O2
Trade Count:
(0)
Posts: 8
Threads: 1
Joined: Oct 2020
Location: Europe
|