IRIX Network Forums
Crazy benchmark series - Printable Version

+- IRIX Network Forums (//forums.irixnet.org)
+-- Forum: Off-Topic (//forums.irixnet.org/forum-49.html)
+--- Forum: General Off-Topic (//forums.irixnet.org/forum-13.html)
+--- Thread: Crazy benchmark series (/thread-2793.html)



Crazy benchmark series - Shiunbird - 04-06-2021

Hello everyone,

I've always been very frustrated that some it's hard to find cross-platform benchmark tools that survived the test of time.
So, since I've been learning how to program and OpenGL, I decided that it will be my challenge.

My first test is pure C, and it divides the largest 32 bit prime integer by all odd numbers.
It is made to be slow and inefficient on purpose. Here are my results, all single-threaded:

MacBook Pro (employer-issued):
4.35 seconds
Intel Core i7-7567 3.5 GHz

Mac Pro:
8.95 seconds
Xeon X5657 3.06 GHz

Odroid-XU4:
16.56 seconds
Samsung Exynos5422 ARM® Cortex™-A15 Quad 2.0GHz/Cortex™-A7 Quad 1.4GHz

Raspberry Pi 3 Model B Rev 1.2
22.54 seconds
BCM2835

Mac G4:
31.25 seconds
G4 867

Raspberry Pi Zero:
87.21 seconds
1GHz ARMv6 (BCM2835)

HP 9000 785 C8000:
125.04 seconds (GCC)
108 seconds (HP's aCC)
PA8900 1 GHz

WinCE StrongARM SA1110 ARMv4 206 MHz:
240 seconds

WinCE SH4 SS7750 128 MHz:
870 seconds


I'm very surprised by how poorly the PA8900 performs. I'll try HP's compiler once I get HP-UX up.

What other tests do you suggest I could write?
I'm interested in assessing the advantages of having large L2 caches and the performance penalty when going to the RAM, and other tests that could measure the strengths of our UNIX workstations compared to what the market had available.

I'm also half-way through to writing my OpenGL benchmark. It generates different kinds of 2D and 3D scenes, pushes textures, spheres and 2D polygons around. I'm aiming for OpenGL 1.3 and 3.1, in C, with -lGL, lX11, -lm and -lGLU, so you could build it and try.
It already runs quite ok, I just have a broken FPS counter that I'm trying to fix.


RE: Crazy benchmark series - Raion - 04-06-2021

PA-8900 also may have better luck on MT loads. But yeah, try HP's compiler is my suggestion. I may wanna run this on an octane just cause.

I've actually in the past run SPEC benchmarks (which I can't legally publish) on Octanes, RPis and a Tezro.

My original RPi Model B performed about 3 times slower than a 300MHz Octane just on integer performance. FP, the RPi had SIMD so it did beat it, but that's not necessarily a major advantage, as SGI hardware was designed to not run that on the CPU and instead abstract it on the helper chips like the O2's coprocessor and such. The Tezro was about 4 times faster, than the Octane was on integer, and about 2.3 times faster on FP. The RPi could still beat the Tezro but.

When I repeated the test with an RPi 2 and 3, it was incrementally better, but I don't think that any real ARM processor other than maybe the latest iPhones are actually as fast as people claim. ARM IME has a very low performance gradient that is only modest at best. POWERel is where the powah is at.


RE: Crazy benchmark series - Trippynet - 04-06-2021

I'd be intrigued to see how such a test fares on a couple of SGI machines, and maybe on a phone or two - just to compare Smile


RE: Crazy benchmark series - Shiunbird - 04-06-2021

I'd love to get my hands on a Talos as my main do-everything workstation, but it's faaaar beyond my budget now.

My goal is to get something simple that can be compiled using few-to-no modifications to run on the hardware that we own around here and, as I get more knowledge and with everyone's input, make sure to implement optimisation for each architecture to have each one giving its best shot.

If there's enough interest, it can be our in-house tool and we can debunk any myths that may still be left.

Question:
I don't yet own a SGI. Is this a good reference for me?
https://irix7.com/techpubs/007-2392-003.pdf


RE: Crazy benchmark series - Raion - 04-06-2021

(04-06-2021, 06:08 PM)Shiunbird Wrote:  I'd love to get my hands on a Talos as my main do-everything workstation, but it's faaaar beyond my budget now.

My goal is to get something simple that can be compiled using few-to-no modifications to run on the hardware that we own around here and, as I get more knowledge and with everyone's input, make sure to implement optimisation for each architecture to have each one giving its best shot.

If there's enough interest, it can be our in-house tool and we can debunk any myths that may still be left.

Question:
I don't yet own a SGI. Is this a good reference for me?
https://irix7.com/techpubs/007-2392-003.pdf

I'll certainly test it on IRIX machines Smile

Yes, that reference is good. MIPSPro is very stickler for C - make sure you don't use __attribute__ or any nonstandard C stuff. String literals can be an issue too.


RE: Crazy benchmark series - Shiunbird - 04-29-2021

So I built my benchmark on HP-UX using the included cc, and it was only 5 seconds faster.
120.9 seconds vs 125.04.

CC is incredibly crippled and doesn't do ANSI. I had to re-write a ton of stuff to get even the simplest of things to run.

I also got the OpenGL libraries up fine from TCOE, thanks for all the help!
Any tips on where I could find aCC? Is it included in the HP-UX media?
It's not part of the default install and also not with TCOE (which surprised me a bit).

Also... I used GLUT in most of my code, but it's not out of the box on HP-UX, even though it is available.
What's the status of it in IRIX? Should I consider converting the code now that I'm still in the beginning so it's easier to build?
Or GLUT is a common sight in all the platforms we have around here?


RE: Crazy benchmark series - Raion - 04-29-2021

GLUT is peanuts to compile


RE: Crazy benchmark series - Shiunbird - 04-29-2021

Super - I'm glad to hear.

I found out that HP-UX comes with GLUT as well.
I still need to sort out my libpath configuration, but I seem to have all the libraries that I need.

CC being non-ANSI is being a bit painful but I hope I will manage to find aCC somewhere.

I'm playing with it. There are tons of demos. It would owe nothing to the SGIs. There's even a "distributed driving simulator" with a client-server configuration, but it must be built and I am struggling with it a bit.


RE: Crazy benchmark series - Raion - 05-08-2021

Ok, Shiunbird sent me his code. It requires GLUT, I used freeglut 2.8.1. Still working out bugs with him.


RE: Crazy benchmark series - Shiunbird - 05-21-2021

I found a very good solution for memory bandwidth tests, and it works pretty well (plus, it's from 1995).
It builds out of the box on my C8000 and Mac Pro, just had to add float.h.

https://www.cs.virginia.edu/stream/FTP/Code/Versions/Old/1996-08-18/stream_d.c

Here are my results:
2009 Mac Pro (triple channel configuration)

Timing calibration ; time = 17825.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 5328.005 3073.312 3003.000 3241.000
Scaling : 5740.940 2880.874 2787.000 3097.000
Summing : 6984.866 3579.463 3436.000 3728.000
SAXPYing : 6804.650 3650.492 3527.000 3991.000

My C8000:
Timing calibration ; time = 41090.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 2905.921 5634.583 5506.000 6514.000
Scaling : 2911.208 5508.610 5496.000 5533.000
Summing : 3172.924 7635.269 7564.000 8166.000
SAXPYing : 3136.435 7667.305 7652.000 7679.000

Internet research indicates that for a single threaded operation the Mac Pro would do 6211MB/s for an unspecified memory test, so I think we are good to go.