OpenBSD/sgi
#31
RE: OpenBSD/sgi
Ok, I'm through, though I don't believe 4fc402b61836e8a270d48f5516496cfbfe57776b (0541ad29664165bece2a7957d02a188f9bafb73b in real) is the real culprit. I mean, yeah it is in the network code so could have a play in the problem EDIT: and it intruced a bug for big endian machines according to the next change to the according file, but why should it specifically affect IP28 and no other machine, not even the Indy which uses the same NIC?

I rather believe this is the reason of the problem I mentioned earlier (hangs after NFS mounts) and that I also saw on the Octane but which was gone later on EDIT: my memory played my a trick here: the Octane didn't hang but took a long time according to my logs at around the NFS mounts (the solution - according to the commit dates - was to replay 85caa4b also for sgi with 1583f1e). Also newer commits identified as bad did not hang on the Indigo². I'll look into that on the weekend probably.

Code:
$ git bisect log
git bisect start
# bad: [0d58b8b8b5e0621f84efa993ee9ef47605603beb] drop the -beta
git bisect bad 0d58b8b8b5e0621f84efa993ee9ef47605603beb
# good: [5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7] close enough to release, we drop -beta
git bisect good 5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7
# good: [9f850877c8e5a89e6bfb255f1f7026c00bb7875e] vmm(4): reference count vm's and vcpu's
git bisect good 9f850877c8e5a89e6bfb255f1f7026c00bb7875e
# bad: [970cf9c09324fa9781be02dc122de675098fbf1d] Don't yet configure smmu(4) on Qualcomm SoCs as used on the Lenovo x13s as it is still not ready for runtime use and probably needs further quirks.
git bisect bad 970cf9c09324fa9781be02dc122de675098fbf1d
# good: [a7cdf5850edf952aab05a1d12a910add326a1f7c] Make test table based, extend it a little
git bisect good a7cdf5850edf952aab05a1d12a910add326a1f7c
# bad: [a39c18f28d16b1a61658f6ce07a74bc58176db30] strlen was in v6 libc (s5/perror.c) but not documented till v7 ok schwarze@
git bisect bad a39c18f28d16b1a61658f6ce07a74bc58176db30
# good: [17fc9e5b1d3178a8d65cacff7114b83163f14a02] The IPv4 reassembly code is MP safe, so we can run it in parallel. Note that ip_ours() runs with shared netlock, while ip_local() has exclusive netlock after queuing.  Move existing the code into function ip_fragcheck() and call it from ip_ours(). OK mvs@
git bisect good 17fc9e5b1d3178a8d65cacff7114b83163f14a02
# bad: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@
git bisect bad 4fc402b61836e8a270d48f5516496cfbfe57776b
# good: [cd0dd8f18578e5b883f43aa8ea64d31710ca1159] Force disabling the use of delay slots. This is ugly but gets the compiler to produce 99+% correct code at all optimization levels, and can help people who would like to tinker a bit with the backend.
git bisect good cd0dd8f18578e5b883f43aa8ea64d31710ca1159
# good: [0e641b41fe54fcc8de2a3351ff974e0677060113] Remove bogus mtw_read_cfg.
git bisect good 0e641b41fe54fcc8de2a3351ff974e0677060113
# good: [5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f] Zap prototypes for nonexistent nd6_setmtu() and in6_ifdel()
git bisect good 5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f
# good: [e62afb52dea0a7b7d0c6c099652a54e60340a22d] Fix RFC number in comment
git bisect good e62afb52dea0a7b7d0c6c099652a54e60340a22d
# good: [61f35befa9a0619b3becea84efb445917c00389e] Add a second test to validate the tables in the library.
git bisect good 61f35befa9a0619b3becea84efb445917c00389e
# first bad commit: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@

Each bisect step required the following steps for verification:

  1. tar repo state - seconds
  2. copy it over to the NFS server - about 1m
  3. untar it on the Octane to a 15K SCSI disk from NFS - about 20m
  4. compile it - about 28m
  5. test it on Indigo² - under 5m

...so roughly 50 minutes per step.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
(This post was last modified: 02-03-2023, 12:29 PM by johnnym.)
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
02-01-2023, 07:35 PM
#32
RE: OpenBSD/sgi
Couldn't let it rest today:

So I followed my suspicion from yesterday and indeed, with the changes from commit 1109691f1d2 applied on top of 4fc402b6183 the hangs after the NFS mounts are gone and the Indigo² happily boots to the login prompt and works correctly, so this is not the real problem. Checking out the whole repo at 1109691f1d2 and compiling it also gives the same result. Biggrin

So we got a new good commit and the last bad commit that didn't hang after the NFS mounts as new bad commit for another round of bisecting. This time only 104 commits to search through, "roughly 7 steps" according to git bisect.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
02-02-2023, 09:58 PM
#33
RE: OpenBSD/sgi
Feeling lucky today, so I'll first try to manually search and find 1 or 2 (if need be) commit(s) - one good or one bad - close to the respecitve other end of the search room (same in OpenBSD's original source), thus to cut the number of needed bisect steps short. Fingers crossed.  Biggrin

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
02-04-2023, 03:26 PM
#34
RE: OpenBSD/sgi
Unfortunately the two commits selected and tried (3b3dd72256d and 805206b941e) didn't limit the search room to a big degree but could rather be used as new good and bad commits.

But when quickly scanning through the remaining commits I recognized a specific one (ae6cd46) that "benefits most mips64 platforms":

Code:
commit ae6cd4623ffb7b807b08788d8e53f7a9259c0c82
Author: miod <miod@openbsd.org>
Date:   Sun Aug 7 19:40:48 2022 +0000

    Use PMAP_PREFER_ALIGN() == 0 rather than !defined(PMAP_PREFER) to enable the
    fast path in the pager code; this benefits most mips64 platforms.
   
    ok kettenis@ mpi@
   
    (cherry picked from commit d600f90f1a804e442018f93ce8ec61f99cd5fb69)

I checked its parent which was still good and then checked it itself which resulted in a bad kernel producing the errors I described here earlier and also on GitHub for IP28. The corresponding issue on GitHub was updated accordingly and has the details.



On the way I also could cut down the time needed for compiling considerably by using git diff to create a patch and apply that one to move between revisions and let make find out what to recompile. Not sure if this will also be faster for commits that are further away than what I was operating on.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
02-04-2023, 07:05 PM
#35
RE: OpenBSD/sgi
Looks like I never posted that I found a fix/workaround for the problem with the IP28 kernel in OpenBSD/sgi 7.2:

Well, my "solution" was to extend an existing - i.e still existing in OpenBSD/sgi - clause meant for R5000 and R7000 processors in sys/arch/mips64/include/pmap.h to also trigger for IP28 (i.e. CPU_R10000 and TGT_INDIGO2):

Code:
diff --git a/sys/arch/mips64/include/pmap.h b/sys/arch/mips64/include/pmap.h
index 7cbac309a96..391e542797c 100644
--- a/sys/arch/mips64/include/pmap.h
+++ b/sys/arch/mips64/include/pmap.h
@@ -177,8 +177,11 @@ void    pmap_page_cache(vm_page_t, u_int);
  * and many structures containing fields which will be used with
  * <machine/atomic.h> routines are allocated from pools, __HAVE_PMAP_DIRECT can
  * not be defined on systems which may use flawed processors.
+ *
+ * There could be a similar problem for the IP28 aka POWER Indigo2 R10000, so
+ * we exclude the definition of __HAVE_PMAP_DIRECT for these systems, too.
  */
-#if !defined(CPU_R5000) && !defined(CPU_RM7000)
+#if !( defined(CPU_R5000) || defined(CPU_RM7000) || ( defined(TGT_INDIGO2) && defined(CPU_R10000) ) )
#define    __HAVE_PMAP_DIRECT
vaddr_t    pmap_map_direct(vm_page_t);
vm_page_t pmap_unmap_direct(vaddr_t);

...and which prevents the definition of __HAVE_PMAP_DIRECT which leads to the described problems on IP28 with commit ae6cd46 in sgi-never-retired branch (d600f90 in the official OpenBSD source code).

The corresponding issue on GitHub was updated accordingly, as was the IP28 kernel of OpenBSD/sgi 7.2 on GitHub.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
03-01-2023, 03:46 PM
#36
RE: OpenBSD/sgi
Finally, but still earlier than expected, OpenBSD switched to 7.3:

Code:
commit 9a3badca5016bb6b6ce5e35f28496815da15afb9 (HEAD -> sgi-is-alive-at-7.3)
Author: deraadt <deraadt@openbsd.org>
Date:   Fri Mar 17 22:52:22 2023 +0000

    remove -beta tag

See https://github.com/openbsd/src/blob/1750...newvers.sh

Got some work to do...  Biggrin

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
03-19-2023, 09:12 AM
#37
RE: OpenBSD/sgi
Now look at that, my Octane already got a new MP kernel running:

Code:
>> boot
Setting $netaddr to 172.16.2.51 (from server )
Obtaining /sash from server
7278928+720752 entry: 0xa800000020020000
ARCS64 Firmware
Found SGI-IP30, setting up.
Initial setup done, switching console.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.3 (GENERIC-IP30.MP) #0: Sun Mar 19 21:29:24 CET 2023
    root@octane.machine-hall.org:/usr/src/sys/arch/sgi/compile/GENERIC-IP30.MP
real mem = 2147483648 (2048MB)
rsvd mem = 1064960 (2MB)
avail mem = 2119352320 (2021MB)
warning: no entropy supplied by boot loader
random: boothowto does not indicate good seed
mainbus0 at root: Octane
cpu0 at mainbus0: MIPS R12000 CPU rev 2.3 300 MHz, R10000 FPU rev 0.0
cpu0: cache L1-I 32KB D 32KB 2 way, L2 2048KB 2 way
cpu1 at mainbus0: MIPS R12000 CPU rev 2.3 300 MHz, R10000 FPU rev 0.0
cpu1: cache L1-I 32KB D 32KB 2 way, L2 2048KB 2 way
clock0 at mainbus0: int 5
xbow0 at mainbus0: XBow revision 4
xheart0 at xbow0 widget 8: Heart revision 4
onewire0 at xheart0
owserial0 at onewire0 "16kb EPROM" sn xxxxxxxxxxxx
owserial0: "PM20300MHZ" p/n 030-1356-001, serial xxxxxx
xbridge0 at xbow0 widget 15: Bridge revision 3
xbpci0 at xbridge0 bus 0: 33 MHz PCI bus
pci0 at xbpci0 bus 0
qlw0 at pci0 dev 0 function 0 "QLogic ISP1020" rev 0x05: irq 0, xbow irq 14
qlw0: nvram corrupt
qlw0: firmware rev 4.66.0, attrs 0x0
scsibus0 at qlw0: 16 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <HP 73.4G, ST373455LC, HPC8> xxxxxxxxxxxxxxxxxxxx
sd0: 70007MB, 512 bytes/sector, 143374738 sectors
sd1 at scsibus0 targ 2 lun 0: <COMPAQ, BD07288277, HPB0> xxxxxxxxxxxxxxxxxxxx
sd1: 69464MB, 512 bytes/sector, 142264000 sectors
qlw1 at pci0 dev 1 function 0 "QLogic ISP1020" rev 0x05: irq 1, xbow irq 13
qlw1: nvram corrupt
qlw1: firmware rev 4.66.0, attrs 0x0
scsibus1 at qlw1: 16 targets, initiator 0
ioc0 at pci0 dev 2 function 0 "SGI IOC3" rev 0x01
onewire1 at ioc0
owmac0 at onewire1 "1kb EPROM" sn xxxxxxxxxxxx
owmac0: Ethernet Address xx:xx:xx:xx:xx:xx
owserial1 at onewire1 "16kb EPROM" sn xxxxxxxxxxxx
owserial1: "FP1" p/n 030-0891-003, serial xxxxxx
owserial2 at onewire1 "16kb EPROM" sn xxxxxxxxxxxx
owserial2: "PWR.SPPLY.ER" p/n 060-0035-002, serial xxxxxxxxxx
ioc0: ethernet irq 2, xbow irq 12
ioc0: superio irq 4, xbow irq 11
com0 at ioc0 base 0x20178: ns16550a, 16 byte fifo
com0: console
com1 at ioc0 base 0x20170: ns16550a, 16 byte fifo
iockbc0 at ioc0
iec0 at ioc0: 128KB SSRAM, address xx:xx:xx:xx:xx:xx
icsphy0 at iec0 phy 1: ICS1890 10/100 PHY, rev. 3
lpt at ioc0 not configured
dsrtc0 at ioc0: DS1687
"SGI Rad1" rev 0xc0 at pci0 dev 3 function 0 not configured
power0 at mainbus0
/dev/ksyms: Symbol table not valid.
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
boot device: iec0
nfs_boot: using interface iec0, with revarp & bootparams
nfs_boot: client_addr=172.16.2.51
nfs_boot: server_addr=172.16.0.1 hostname=octane
root on 172.16.0.2:/srv/nfs/octane/root
WARNING: clock gained 99 days
WARNING: CHECK AND RESET THE DATE!
swap on 172.16.0.2:/srv/nfs/octane/swap
Automatic boot in progress: starting file system checks.
pfctl: DIOCADDRULE: Operation not supported by device
pf enabled
starting network
pfctl: DIOCADDRULE: Operation not supported by device
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
swapctl: adding 172.16.0.2:/srv/nfs/openbsd/7.2/octeon/hosts/octane2/swap as swap device at priority 0
kvm_mkdb: can't open /dev/ksyms
savecore: /bsd: kvm_read: version misread
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd smtpd sndiod.
starting local daemons: cron.
Sun Mar 19 21:44:02 CET 2023

OpenBSD/sgi (octane.machine-hall.org) (console)

login: root
Last login: Mon Feb  6 13:08:06 on console
OpenBSD 7.3 (GENERIC-IP30.MP) #0: Sun Mar 19 21:29:24 CET 2023

Welcome to OpenBSD: The proactively secure Unix-like operating system.

Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code.  With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.

You have mail.
octane# machine
octeon 
octane# sysctl hw
hw.machine=sgi
hw.model=IP30
hw.ncpu=2
hw.byteorder=4321
hw.pagesize=16384
hw.disknames=sd0:cff4231e147e67d8,sd1:87a703b75b1e1601
hw.diskcount=2
hw.cpuspeed=299
hw.vendor=SGI
hw.product=Octane
hw.physmem=2147483648
hw.usermem=2147450880
hw.ncpufound=2
hw.allowpowerdown=1
hw.ncpuonline=2
hw.power=1
octane#

Unfortunately it doesn't like to work with my OpenBSD/sgi 7.0 FS, so I booted an octeon 7.2 FS instead. But this also means, I can't try out this kernel when compiling the 7.3 kernels for the other machines, something I usuall do for testing the new kernel(s).  Undecided

Well, we can't have everything at once.  Biggrin

Find the new OpenBSD/sgi 7.3 branch on GitHub.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
03-19-2023, 09:02 PM
#38
RE: OpenBSD/sgi
OpenBSD/sgi 7.3

I have by now created all kernels for OpenBSD/sgi 7.3 (incl. for R8000 Indigo² (IP26)) and tested all kernels I have machines for (please see the corresponding release page on GitHub for details).

Already available since a while is also an intro branch that gives an overview.

Every machine was tested by successfully booting with a OpenBSD/octeon 7.3 FS snapshot - OpenBSD 7.3 hasn't released yet! - and running a few benchmarks (7za, openssl) using MP operation where possible. The boot logs are linked from the above mentioned release page, as are the new kernels.

Unfortunately also this release does come with issues, this time for two machines:
  • Indy (IP22)
  • R10000 Indigo² (IP28)

I created an issue over at GitHub to follow the process of finding the reason for and hopefully solving this issue.

So it looks like it's time to bring the sgi-never-retired branch forward and do some bisecting starting with the Indy (IP22) kernel.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
(This post was last modified: 03-31-2023, 05:49 PM by johnnym.)
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
03-31-2023, 05:47 PM
#39
RE: OpenBSD/sgi
There might be some confusion about what systems OpenBSD/sgi runs on. So let me clarify that by citing "official" information from the intro(4/sgi) manpage of OpenBSD/sgi 6.9:

Quote:[...]
HARDWARE

The following systems are supported:

Hardware Family Kernel Model
IP20     IP20   IP22   Indigo (R4k)
IP22     IP22   IP22   Indigo2, Challenge M (R4k)
IP24     IP22   IP22   Indy*, Challenge S
IP26     IP22   IP26   POWER Indigo2 (R8000)
IP27     IP27   IP27   Origin 2x00, Onyx 2
IP28     IP22   IP28   POWER Indigo2 (R10000)*
IP29     IP27   IP27   Origin 200
IP30     IP30   IP30   Octane*, Octane 2* (Speedracer)
IP31     IP27   IP27   Origin 200*/2x00, Onyx 2 (250+ MHz)
IP32     IP32   IP32   O2*, O2+ (Moosehead)
IP34     IP35   IP27   Fuel (Asterix)
IP35     IP35   IP27   Origin 3x00, Onyx 3x000, Onyx 3
IP39     IP35   IP27   Onyx 4
IP45     IP35   IP27   Origin 300, Onyx 300
IP53     IP35   IP27   Origin 350, Onyx 350, Tezro
IP59     IP35   IP27   Origin 350, Onyx 350, Tezro (1GHz)
[...]

I can confirm the principle working of the systems marked with a * - these are the systems I have at hand and tested so far - for OpenBSD/sgi up to 7.3 with the exception of Indy and R10000 Indigo² for which I try to track down the problem cause. For the Indy I could already clarify that the issue detected for OpenBSD/sgi 7.3 is not new but present in all versions since 6.9 (and maybe even earlier), see https://github.com/the-machine-hall/open...1494425202 for details.

I provide kernels for all of the above listed systems on GitHub and those can be easily tested by netbooting them from the PROM on the respective system.

Indigo Indy Indigo2 R10000/IMPACT O2 Octane Octane2 Origin 200=Origin 200-Origin 200=Origin 200
johnnym
Tezro

Trade Count: (0)
Posts: 268
Threads: 9
Joined: Jun 2018
Find Reply
04-06-2023, 07:44 PM
#40
RE: OpenBSD/sgi
Thanks for the clarification. The last time I looked into openBSD for SGI was on a whim in 2015 and back then the support for anything that was origin 300 or Chimera based (the fuel and origin 300 are more closely based on a different architecture than the later Chimera systems ) was not particularly good.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,241
Threads: 533
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
04-06-2023, 11:44 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)