OpenBSD/sgi
#1
OpenBSD/sgi
First of all I wish you all a happy new year!

I'd like to make you aware of something I've been plotting on since a while now - actually since OpenBSD retired the sgi architecture after 6.9 in Oct 2021. Back then I was pretty sad about that move, although it was on the table since a while already back then - actually since 6.6 IIRC, but some person or some people kept it alive until after 6.9. Much obliged for that.
Back then I by chance - well, it was more like trial and error - discovered that the octeon userland is actually compatible with sgi kernels - they use the same packages tree, so they ought to. You only need to change the baud rate for the console device to 9600 for sgi. I later found out that the 6.9 IP30 kernels even work with 7.0 octeon file systems. As I run my machines diskless switching between kernels and userlands is as easy as changing a symlink. Back then I thought this could allow to only forward port the sgi kernels and use the octeon userland stuff and hence avoid creating sgi userlands.

Well, time passed on and I didn't get anywhere for that, but in late November 2022 I finally found the time and dedication to get things going. First things I noticed was, that it seems to be impossible to compile sgi kernels from an octeon userland, as these are missing the gcc 4.2.1 that's needed for compiling the sgi kernels, it looks like LLVM/clang can't do that for sgi kernels. Well, so I was stuck with a 6.9 sgi userland, but this one so far allowed to build all sgi kernels.

I used the GitHub mirror of OpenBSD's CVS src repository for this endeavour - mainly because I'm much more familiar with git than with cvs and rather wanted to invest my time in the actual task than to invest it in learning all about cvs. I started by reverting the commits that removed the sgi related stuff, based on the src repo leaving 7.0-beta. Then I tried to compile the sgi kernels and worked through all errors during compilation by reverting related commits that removed additional sgi stuff and by replaying changes that seemed to be needed from the octeon arch. Unfortunately not all missing things show already during compilation but so far I was able to get everything going on my test machines, except for one kernel (see below).

I've created three branches, check the links (and the history of those branches) for details about the changes I made:
My available resources for compilation are unfortunately rather limited: I am doing everything on my dual 300 MHz R12K Octane, which is good enough for building the kernels (about 50 mins per kernel and a little faster than my Octane2 with a single 400 MHz R12K) but struggles with userland builds. I tried to build an OpenBSD/sgi 7.0 userland for a while now - so far it ran for an aggregated 110 hours, and mostly for LLVM/clang actually - but it's still not done. I modified the main Makefile to be able to interrupt a build and later continue without loosing the already compiled stuff as I don't want to run my machine unattended for too long.

I'm not sure what LLVM/clang is used for on OpenBSD/sgi other than compiling LLVM/clang, the sgi userland was compiled mainly with gcc 4.2.1 IIRC, the kernels can only be compiled with gcc 4.2.1 IIUIC. Hence I'm thinking about skipping the build of LLVM/clang and modify the release file lists accordingly so I can build an OpenBSD/sgi userland w/o building LLVM/clang. It looks like I simply don't have the computing resources to build LLVM/clang in an acceptable amount of time. Assuming an Origin 350 with 4 x 1 GHz R16K could really be 2 x 3 times as fast as my Octane with 2 x 300 MHz R12K this could be done in maybe under a day with such a machine.

If someone is interested in trying that, please feel free to contact me, I can get you going. I needed 2 GiB of memory to be able to compile some LLVM/clang parts with two processes. I started with three processes, but ran out of memory. So a four processor machine will likely need a minimum of 4 GiB of memory, better more to be able to use all processors for the compilation.

So far I have built kernels for 7.0, 7.1 and 7.2:

...and for 7.0 and 7.1 everything runs well on the machines I have tested it:
  • R4400 Indy (IP22)
  • R10000 Indigo² (IP28)
  • R12000 Origin200 (IP27, both SP and MP)
  • R12000 Octane (IP30, both SP and MP)
  • R5000 O2 (IP32)

You can check the logs linked from GitHub over at https://dmesgd.nycbug.org/ for details. I haven't yet uploaded logs for 7.2. But everything seems to work so far for Indy, Octane (SP and MP) and O2, Origin200 no yet tested.

Current problem

Now to my current problem:

Unfortunately between 7.1 and 7.2 something broke for the R10000 Indigo² (IP28) and up until now I wasn't able to figure out what. The problem only affects the R10000 Indigo², all the other machines happily boot their 7.2 kernels with the 7.2 octeon file system. But for the R10000 Indigo² things fall apart as soon as parts of the userland start execution with segfault after segfault. Now the interesting thing here is, this does not happen when using single user mode. Trying a few userland tools where I could see core files left from the last boot(s), even with the root FS mounted r/w doesn't show any problems. But when exiting single user mode things again fall apart, though this time with bus errors instead of segfaults. So everything works in single user mode but falls apart in multi-user mode.

I'd appreciate any help to figure out what's causing this problem.

EDIT: A workaround is available on GitHub, the IP28 kernel at GitHub was updated accordingly.

Help wanted

If you're interested in helping me and be part of this "project" please let me know. I'd also be interested in test results from machines I don't have available or not in working order, like Tezro, Fuel, Origin 350, R12K O2, R4K Indigo², R8K Indigo², R5K Indy and R4K Indigo (have one, but was gutted and partly destroyed by some idiot). If you have the time and interest I'll try to get you going in no time.

Outlook

My rough plan for the future is to get new sgi userlands built and from there work on the other stuff, like RAMdisk kernels and maybe an ISO so users can install their systems on disk if needed and don't have to resort to network booting and nfsrb2 for building their file systems.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
(This post was last modified: 03-01-2023, 03:00 PM by johnnym.)
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
01-05-2023, 02:43 PM
#2
RE: OpenBSD/sgi
A bus error usually means a stack alignment or memory issue. I don't run OpenBSD, but I like seeing someone carry on the legacy. Any reason why you don't like NetBSD on your systems, by chance?

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,242
Threads: 534
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
01-05-2023, 04:06 PM
#3
RE: OpenBSD/sgi
(01-05-2023, 04:06 PM)Raion Wrote:  A bus error usually means a stack alignment or memory issue.
Ok, good to know. But the hardware should be OK, as this does not happen with the 7.1 kernel and a 7.1 FS (same for 7.0). It could be that the memory is organized differently between 7.1 and 7.2 and maybe some broken SIMMs are touched then. I'll check them with the IDE, to be sure.

But the strange thing is, that in single user mode everything seems to work properly. When going directly to multi-user during boot-up the bus errors "become" segfaults but the same binaries bomb IIRC. I have logs of the issue, but not sure if posting them here would do any good now.

(01-05-2023, 04:06 PM)Raion Wrote:  I don't run OpenBSD, but I like seeing someone carry on the legacy.
Thanks, though not sure I'm good enough to keep this going, but I'll at least try.  Wink  And I simply couldn't let this good stuff sink into oblivion.

(01-05-2023, 04:06 PM)Raion Wrote:  Any reason why you don't like NetBSD on your systems, by chance?
Oh, I do, I for example run NetBSD on my R3K Indigo from Dexter1, though this only runs with the help of unmerged code from Japan Wink. But OpenBSD actually supports every SGI machine I have (except for the R3K Indigo), whereas NetBSD only supports or supported the Indy and the O2 (plus R3K Indigo) of my machines. So my preferred free OS on SGI is OpenBSD. OpenBSD (4.9 to be exact) was also the first OS I ran on my Octane2 years ago, even before trying to use IRIX on it.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
01-05-2023, 05:40 PM
#4
RE: OpenBSD/sgi
That sounds like a network error. Maybe the network code is misidentifying the system and trying to load memory that's unaligned?

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,242
Threads: 534
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
01-05-2023, 06:30 PM
#5
RE: OpenBSD/sgi
Try building it without the network drivers installed? That's my current thinking. If that works, then it's a network driver issue.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,242
Threads: 534
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
01-05-2023, 06:45 PM
#6
RE: OpenBSD/sgi
(01-05-2023, 06:30 PM)Raion Wrote:  That sounds like a network error. Maybe the network code is misidentifying the system and trying to load memory that's unaligned?
Could be, but at least for the messages put out, it correctly identifies the machine. And it is also already running from netwprk when in single user mode.

I have now put two logs on pastebin, all produced with OpenBSD/sgi 7.2 with 7.2 octeon FS on R10K Indigo²:

  1. directly going to multi-user mode during boot-up
  2. starting up in single user mode, checking some tools that previously segfaulted and left core dumps and then going multi-user

Raion Wrote:Try building it without the network drivers installed? That's my current thinking. If that works, then it's a network driver issue.
Well, its running diskless, can't do that. But I could try if I could get it to install the octeon sets on disk and then look how it behaves when booting from disk instead of the network.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
01-05-2023, 06:55 PM
#7
RE: OpenBSD/sgi
MIPSIV (the R10000) and MIPS64 (Octeon) are not completely compatible. Cache control is incompatible, IIRC. So what may be happening is that the cache is in an inconsistent state, explaining your bus errors (bus errors are detected external to the CPU and sometimes reflect bogus addresses). The segmentation violations are detected by the CPU when it accesses a process's page tables in response to a TLB miss. Stale cache data could also cause those.
Take a look at the 2nd edition of "See MIPS Run" and what it says about the differences.

The output is hard to understand because there is no indication of which processes are being killed but we know they are being run from rc files.
Some other output stands out, in particular the usage notes indicate that something is wrong with the parameters being supplied to those utilities in the rc files. Also "unexpected !=" means there is a syntax error, possibly caused by an unset variable.
You can copy and paste one line at a time from the rc file to see where the problem first occurs.

Personaliris O2 Indigo2 R10000/IMPACT Indigo2 R10000/IMPACT Indigo2 Indy   (past: 4D70GT)
(This post was last modified: 01-05-2023, 09:16 PM by robespierre.)
robespierre
refector peritus

Trade Count: (0)
Posts: 640
Threads: 3
Joined: Nov 2020
Location: Massholium
Find Reply
01-05-2023, 09:11 PM
#8
RE: OpenBSD/sgi
In the meantime I checked the memory modules from the IDE. They all come back ok after running memtest. But It looks like one of the three quads used EDO DRAMs.  I removed that quad now to be sure, but it didn't make a difference.

(01-05-2023, 09:11 PM)robespierre Wrote:  MIPSIV (the R10000) and MIPS64 (Octeon) are not completely compatible. Cache control is incompatible, IIRC. So what may be happening is that the cache is in an inconsistent state, explaining your bus errors (bus errors are detected external to the CPU and sometimes reflect bogus addresses). The segmentation violations are detected by the CPU when it accesses a process's page tables in response to a TLB miss. Stale cache data could also cause those.
But how could this explain, that these differences don't come to play in single user mode on the very same machine? And the other machines (R4K Indy, R5K O2 and R12K Octane) are not affected. Hm, I could try to check how a R10K Octane behaves but need to get another system board first.

And still, both sgi and octeon use the same package source (e.g. https://ftp.eu.openbsd.org/pub/OpenBSD/6...es/mips64/) and their userland tools are:

PHP Code:
root@nfs:/srv/nfs/indigo2/root/bin# file cat
catELF 64-bit MSB shared objectMIPSMIPS-III version 1 (SYSV), dynamically linked, for OpenBSDstripped 

(01-05-2023, 09:11 PM)robespierre Wrote:  Take a look at the 2nd edition of "See MIPS Run" and what it says about the differences.
Will do if I find a copy online.

(01-05-2023, 09:11 PM)robespierre Wrote:  The output is hard to understand because there is no indication of which processes are being killed but we know they are being run from rc files.
The second log has a listing of / showing the core files of the killed processes:

PHP Code:
indigo2# ls
.cshrc          dev_mkdb.core   id.core         quotaon.core    swap
.profile        etc             kvm_mkdb.core   root            sys
altroot         getcap
.core     mktemp.core     sbin            tmp
bin             getty
.core      mnt             sed.core        usr
bsd             grep
.core       perl.core       slaacd.core     var
bsd.booted      head.core       pgrep.core      sort.core
dev             home            pkill
.core      ssh-keygen.core 

(01-05-2023, 09:11 PM)robespierre Wrote:  Some other output stands out, in particular the usage notes indicate that something is wrong with the parameters being supplied to those utilities in the rc files. Also "unexpected !=" means there is a syntax error, possibly caused by an unset variable.
You can copy and paste one line at a time from the rc file to see where the problem first occurs.
I'll have a look into that, thanks for the pointers.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
01-05-2023, 09:57 PM
#9
RE: OpenBSD/sgi
Small update:

(01-05-2023, 09:57 PM)johnnym Wrote:  
(01-05-2023, 09:11 PM)robespierre Wrote:  Some other output stands out, in particular the usage notes indicate that something is wrong with the parameters being supplied to those utilities in the rc files. Also "unexpected !=" means there is a syntax error, possibly caused by an unset variable.
You can copy and paste one line at a time from the rc file to see where the problem first occurs.
I'll have a look into that, thanks for the pointers.
I did examine the multi-user boot process in more detail and enabled and disabled various things in the rc script, but I couldn't find the exact point where things start to bomb. To me it looks like something important changes when the machine leaves single user mode or enters multi-user mode directly, but I can't figure out what.  Undecided

In the meantime I also tried an on-disk installation using a 6.9 sgi RAMdisk kernel for IP28 (looks like I indeed can't build a new one without a release IIUIC) and a 7.2 octeon base72.tgz disguised as base69.tgz using my 7.2 sgi IP28 kernel as /bsd. This worked well, except for the bootloader installation, but thanks to the excellent OpenBSD manpages (sgivol(8)) I could do that manually in a shell after the installation had finished.
And this shows the same picture: Things look good in single user mode, but fall apart in multi-user mode.

But judging from my on-disk install experience and with all the other machines (see above) working correctly, this also means that on-disk installs are already possible now with:
  • 6.9 sgi RAMdisk kernels,
  • correctly named 7.2 octeon sets,
  • 7.2 sgi kernels from GitHub

...and with some manual work afterwards. Maybe I should create a howto on GitHub.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
01-07-2023, 07:50 PM
#10
RE: OpenBSD/sgi
You can also sign up for gitea and post patches there if you prefer to keep it within our community. It's entirely up to you

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,242
Threads: 534
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
01-07-2023, 10:43 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)