tezro problems
#11
RE: tezro problems
I've replaced the snaphat battery on IO9
Initially I didn't get this message:
Quote:Starting PROM Boot process
io_config_space: Found 0 Qlogic devices BASEIO; expected 1 or more.
io_config_space failed:
RSLT io_config_spac FAIL                diag_rc = 53
diag_io6confSpace_sanity: /hw/module/001c01/xtalk/15: FAILED
However it still missess "Walking SCSI Adapter" step... After several power down/up the message re-appears.
I'll replace the dallas chip....  if this doesn't fix the problem it could be bad mainboard or bad IO9 adapter.

On the other hand the lan is on the same adapter and seems to be functional:
Quote:graphics install: searching for pipe 0
Probing IOC4 ATA adapter 2
IOC4 RevId = 83
Initializing PROM Device drivers ..........
  Initializing Base I/O Ethernet Interface...Done.
  ---------------Interface Configuration Summary----------------
  ASIC|Revision|MAC Address      : 5701|B5|08:00:69:11:dc:4b
  Link Negotiation|Advertisement  : On|<H10 F10 H100 F100 H1000 F1000>
  Link|Speed|Duplex|Rx/Tx FlowCtrl: Up|1000|Full|Off/Off
  --------------------------------------------------------------
DONE

Best regards,
Plamen
kokoboi
O2

Trade Count: (0)
Posts: 46
Threads: 5
Joined: May 2018
Find Reply
05-12-2021, 01:45 PM
#12
RE: tezro problems
One of our old timers could likely walk you through a longer CAC/PROM hardware inventory reset procedure after you’ve replaced all your RTC batteries, If you’re still getting these kinds of messages afterwards.

I think a last effort would be to force the system to reinventory everything in case things got disabled or trouble logs got generated.

But replacing those RTC batteries would still be job one, as you’re presently doing.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
05-12-2021, 07:41 PM
#13
RE: tezro problems
I've replaced the dallas chip with a new one, however the L1 stops running at this point... What should I do now ?

Quote:One of our old timers could likely walk you through a longer CAC/PROM hardware inventory reset procedure after you’ve replaced all your RTC batteries, If you’re still getting these kinds of messages afterwards.
I would appreciate if someone guide me through the process.
(This post was last modified: 06-03-2021, 09:08 AM by kokoboi.)
kokoboi
O2

Trade Count: (0)
Posts: 46
Threads: 5
Joined: May 2018
Find Reply
06-03-2021, 08:49 AM
#14
RE: tezro problems
I'm not exactly an old-timer, but I have done this process on my Onyx2 and it's very similar on the Chimera systems. The basic sequence is below, and if you have any questions, just let me know.

  1. At the system's L1 prompt before powering the computer on, enter: debug 0x08. This will set the system to boot into Power-On Diagnostics (POD).
  2. Power the system on, either via the front panel or from the L1 prompt with pwr u.
  3. Once the system has booted to the POD> prompt, enter the command go cac to enter CAChed mode.
  4. Type clearalllogs to clear the system's logged configuration state, including errors.
  5. Type resetalllogs initalllogs to bring the logged configuration state back to the default.
  6. Type flush to flush/empty the system buffers.
  7. Return to the L1 prompt (CTRL-T) and enter debug 0x00 to reset the debug switches to their default state.
  8. Enter reset at the L1 prompt to reset the system.
  9. See how the boot sequence goes. Hopefully things will work better after this process.

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
(This post was last modified: 06-03-2021, 03:07 PM by kaigan.)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
06-03-2021, 12:06 PM
#15
RE: tezro problems
Quote:001c01-L1>debug 0x08
debug switches set to 0x0008
001c01-L1>pwr u
001c01-L1>
entering console mode  001c01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
io_config_space: Found 0 Qlogic devices BASEIO; expected 1 or more.
io_config_space failed:
RSLT io_config_spac FAIL                diag_rc = 53
diag_io6confSpace_sanity: /hw/module/001c01/xtalk/15: FAILED


IP35 PROM SGI Version 6.210  built 02:33:51 PM Aug 26, 2004
*** Warning: System controller debug switches are non-zero (0x8)
*** Boot stop requested at Global (2)
Testing/Initializing memory ...............            DONE
Copying PROM code to memory ...............            DONE
Discovering local IO ......................            DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5893 usec
Waiting for peers to complete discovery....            DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes..............            DONE
Checking partitioning information .........            DONE
No other nodes present; becoming partition master
Local slave entering slave loop
Local slave entering slave loop
Local slave entering slave loop
A 000 001c01:
A 000 001c01: *** Entering POD mode on node 0
A 000 001c01: POD SysCt Cac> go cac
A 000 001c01: Must be in Dex mode before switching to Cac or Unc.
A 000 001c01: POD SysCt Cac>
A 000 001c01: POD SysCt Cac> clearalllogs
A 000 001c01: *** This must be run only after NUMAlink discovery is complete.
A 000 001c01: *** This will clear all previous log variables such as:
A 000 001c01: *** moduleids, nodeids, etc. for all nodes.
A 000 001c01: Clear all logs? [n] y
A 000 001c01: Checking 1 entries for promlogs
A 000 001c01: .DONE
A 000 001c01: All PROM logs cleared!
A 000 001c01: POD SysCt Cac> resetalllogs
A 000 001c01:                ^ Syntax error
A 000 001c01: POD SysCt Cac> initalllogs
A 000 001c01: *** This must be run only after NUMAlink discovery is complete.
A 000 001c01: *** This will clear all previous log variables such as:
A 000 001c01: *** moduleids, nodeids, etc. for all nodes.
A 000 001c01: Clear all logs environment variables, and aliases ? [n] y
A 000 001c01: Checking 1 entries for promlogs
A 000 001c01: .DONE
A 000 001c01: All PROM logs cleared!
A 000 001c01: POD SysCt Cac> flush
A 000 001c01: POD SysCt Cac> reset
A 000 001c01: Resetting the system...

I've set the debug to 0x00 again.
now I see this errors again:
Quote:returning to console mode  001c01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
io_config_space: Found 0 Qlogic devices BASEIO; expected 1 or more.
io_config_space failed:
RSLT io_config_spac FAIL                diag_rc = 53
diag_io6confSpace_sanity: /hw/module/001c01/xtalk/15: FAILED
.....
This is driving me crazy  Angry
(This post was last modified: 06-03-2021, 01:30 PM by kokoboi.)
kokoboi
O2

Trade Count: (0)
Posts: 46
Threads: 5
Joined: May 2018
Find Reply
06-03-2021, 01:10 PM
#16
RE: tezro problems
Wow, you’ve replace the snaphat and the Dallas? We’ll I don’t know how far you’d like to go but if it where me and I choose to keep going I would try two things (high level description that I need to figure out when I had the board out and in my hands):

1. Remove the IO9 from system, remove snaphat, check VCC pin on SCSI controller (find data sheet) for a short to ground.

2. Assuming no short, I would try to get access to that pin or a test pad connected to that pin and check for correct voltage during system-on.

Unless it’s some firmware issue the only thing I can imagine is it is literally not being detected because it has no power or has low voltage. Which would indicate some hardware issue like a shorted localize cap or a power supply issue.

These are just guesses.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-03-2021, 02:05 PM
#17
RE: tezro problems
Quote:Wow, you’ve replace the snaphat and the Dallas?  We’ll I don’t know how far you’d like to go but if it where me and I choose to keep going I would try two things (high level description that I need to figure out when I had the board out and in my hands):

1. Remove the IO9 from system, remove snaphat, check VCC pin on SCSI controller (find data sheet) for a short to ground.

2. Assuming no short, I would try to get access to that pin or a test pad connected to that pin and check for correct voltage during system-on.

Unless it’s some firmware issue the only thing I can imagine is it is literally not being detected because it has no power or has low voltage. Which would indicate some hardware issue like a shorted localize cap or a power supply issue.

These are just guesses.

I've replaced only the snaphat, when the dallas chip is replaced, the machine doesn't start. I think it doesn't like empty NVRAM.
However I really think there is something wrong with IO9 board itself... How on earth the lan which is on  the same board passes its checks.
Anyone with spare IO9 ?

I removed the IO9 card and got almost identical error:
Quote:entering console mode  001c01 CPU0, <CTRL_T> to escape to L1
Starting PROM Boot process
io_config_space: No IOC3 or IOC4 found on BASEIO.
io_config_space: Found 0 Qlogic devices BASEIO; expected 1 or more.
io_config_space failed:
RSLT io_config_spac FAIL                diag_rc = 53
diag_io6confSpace_sanity: /hw/module/001c01/xtalk/15: FAILED
(This post was last modified: 06-03-2021, 02:25 PM by kokoboi.)
kokoboi
O2

Trade Count: (0)
Posts: 46
Threads: 5
Joined: May 2018
Find Reply
06-03-2021, 02:11 PM
#18
RE: tezro problems
A Dallas that isn't dead dead should still work fine. The battery can be shot as long as the chip itself works.

Maybe try rerunning the POD mode commands without the IO9, then run enableall, update and reset from the PROM to see how the system reacts?

Personaliris Indigo Indigo2 Indy Onyx2 Origin 200 Origin Vault O2 Octane2 (VW 320) (VW 540) (VW 550) Fuel Tezro Tezro Rack Origin 350 Onyx4 Altix 350 (Prism Rackmount)
kaigan
Site Admin and SGI Tinkerer

Trade Count: (2)
Posts: 262
Threads: 31
Joined: May 2019
Location: Omaha, NE
Find Reply
06-03-2021, 03:12 PM
#19
RE: tezro problems
I've just tried this.
enableall does not execute
Quote:>> enableall
Enable/Disable table not setup in NVRAM
>>
Even L1 complains about missing IO9
Quote:INFO: Cannot enable VRM: 9
INFO: Cannot enable VRM: 10
INFO: Cannot enable VRM: 11
ALERT: IO9 EEPROM read error, no acknowledge


SGI SN1 L1 Controller
Firmware Image A: Rev. 1.40.6, Built 01/06/2006 13:16:50

BASEIO PROM too:
Quote:BASEIO PROM Monitor SGI Version 6.210  built 02:30:38 PM Aug 26, 2004 (BE64)
4 CPUs on 1 nodes found.
****************************************************************
*      WARNING:  No BASEIO board found -- the system        *
*                  will not be able to boot.                  *
****************************************************************
WARNING: Did not find an IOC3 or IOC4 NVRAM base address
WARNING: Did not find an IOC3 or IOC4 UART base address

NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled

Probably the SCSI chip on IO9 has been fried. Any idea for a new SCSI adapters for the tezro available on eBay ?
(This post was last modified: 06-03-2021, 06:11 PM by kokoboi.)
kokoboi
O2

Trade Count: (0)
Posts: 46
Threads: 5
Joined: May 2018
Find Reply
06-03-2021, 06:04 PM
#20
RE: tezro problems
(06-03-2021, 03:12 PM)kaigan Wrote:  A Dallas that isn't dead dead should still work fine. The battery can be shot as long as the chip itself works.

Maybe try rerunning the POD mode commands without the IO9, then run enableall, update and reset from the PROM to see how the system reacts?

Now that's an IDEA!

KOKOBOI:
Where did you get this Dallas and do you still have your OLD dallas?  If your Dallas is FAKE...could this happen?  Kaigan is right, a dead Dallas battery on a working Dallas IC will work fine with power...use that as a troubleshooting guide.  I've personally NEVER seen a "dead" Dallas Chip (not working), only a dead battery...your L1 keeps trying to make alterations to NVRAM, I assume UNsuccessfully.  Perhaps this could be the reason? 

I know you said things were still a problem with your OLD L1 Dallas, but try the old one with NEW snaphat on IO9 anyway and try reset procedure again. See if there is a difference.

Let us Know.


ALSO, there was an experiment not long ago from jwhat that proved a Tezro will happily accept a BLANK L1 Dallas...but will NOT ACCEPT, NOR ERASE, a USED DALLAS! So there are two options there. Either use a dallas that was drilled with a battery and remove the battery to blank it, two use a programmer to "erase" the old NVRAM data through programming commands. If you got a USED dallas instead of the blank one, the Tezro would reject it with serial errors, though I didn't see that in your output you posted.
(This post was last modified: 06-03-2021, 06:24 PM by weblacky.)
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
06-03-2021, 06:18 PM


Forum Jump:


Users browsing this thread: 2 Guest(s)