SIMH benchmark numbers…

I have to admit, I’m really surprised. In the past Visual C++ had been a clear winner every time I’d checked performance vs Gcc. And the tide seems to have really turned under Windows 7 x64. While not a massive lead, the winner after all these iterations of my simh benchmark was Gcc 4.5.2 for x86_64.

Just in the same fashion here, it seems that on some platforms -O1 is faster then -O2, and you really won’t find out until you run some comparisons.

gcc version 3.4.5 (mingw-vista special r3)

gcc O0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 23
This machine benchmarks at 21739 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 20
This machine benchmarks at 25000 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second

gcc O1
15.33333333333333
Dhrystone(1.1) time for 500000 passes = 16
This machine benchmarks at 31250 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 15
This machine benchmarks at 33333 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 15
This machine benchmarks at 33333 dhrystones/second

gcc 02
12.33333333333333
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

gcc version 4.5.2 20101002 (prerelease) [svn/rev.164902 – mingw-w64/oz] (GCC)

gcc O0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 22
This machine benchmarks at 22727 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second

gcc O1
13.33333333333333
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

gcc O2
12
Dhrystone(1.1) time for 500000 passes = 11
This machine benchmarks at 45454 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64

VC /Od /Bi0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 22
This machine benchmarks at 22727 dhrystones/second

VC /O2 /Ob2 /Oi
13
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

VC /O2
13.66666666666667
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second

Vc /Og /Ox
13
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80×86
VC /Og /Ox
13.33333333333333
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

And this is on my laptop, an Intel Core i7 Q 720 running at 1.6Ghz with 4GB of ram & Windows 7 Home Premium.

Windows 95 & DLL’s…

I hope nobody ever needs to know this, but I found out the hard way, that dll’s built with Visual C++ 6.0 wont’ load correctly on Windows 95… But luckily Visual C++ 4.2 is still available on MSDN, and the speex files built with only 40 or so changes.

I think I’m finally done with the whole winamp thing.

Phew!

MinGW-w64

Well after yesterdays x86_64 excitement, I figured I’d take a look in the windows area to see about the 32bit vs 64bit performance on Windows…

I know this isn’t the ‘best’ test as I’m using VMWare Fusion on OS X and running Windows XP x64 sp2 (the 2007 edition, not the 2003 one).

If anyone wants to know how to get a 64bit gcc on windows, this is the best formula I’ve come up with so far.

First download a build of mingw-w64-bin_x86_64 from sourceforge… Right now I’m using a ‘Personal Build’ from sezero_20101003, because… it’s recent, and I would imagine a personal build has some hope of actually working.

I downloaded that, and extracted to the root of the C drive. Then I renamed the mingw64 to mingw.

Next up, I downloaded and installed MSYS 1.0.11, and installed that. I selected the default options, and of course specified my mingw is installed in

C:/mingw

After that, I install the MSYS DTK 1.0, again with default options.

The final part is some kind of editor, I like VIM but it’s… involved to download as the default package ‘vim-7.2-1-msys-1.0.11-bin.tar.lzma’ is in a 7zip compatible archive that needs a bit of tweaking to get a tar file out of. I can provide it here in gzip format, that you can simply extract within the msys command prompt in the /usr directory.

Now with all that done, you should be in business!

$ gcc -v
Using built-in specs.
Target: x86_64-w64-mingw32
Configured with: ../gcc44-svn/configure –host=x86_64-w64-mingw32 –target=x86_6
4-w64-mingw32 –disable-multilib –enable-checking=release –prefix=/mingw64 –w
ith-sysroot=/mingw64 –enable-languages=c,c++,fortran,objc,obj-c++ –enable-libg
omp –with-gmp=/mingw64 –with-mpfr=/mingw64 –disable-nls –disable-win32-regis
try
Thread model: win32
gcc version 4.4.5 20101001 (release) [svn/rev.164871 – mingw-w64/oz] (GCC)

 

But will it run Dungeon?

What is also cool, is that this build of mingw includes gfortran, which is a Fortran 95 compiler with various 2003 & 2008 enhancements. So for the heck of it, I’ve rebuilt the makefile from dungeon-2.5.6 and tweaked the machdep.f to at least call the ITIME function to get the current time. The resulting archive runs pretty well!

Windows XP x64 - dungeon

Yes, it runs! And without a *32 meaning this is a 64bit binary!

 

Onward with SIMH

So going back to SIMH as my benchmark, here is the vax780 with -O0/-O0

Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 20
This machine benchmarks at 25000 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second

Which comparing it to the native x86_64 build is pretty good considering I’m running this in a VM (VMware Fusion!). Now the same test with -O1/-O1

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Which is pretty good! Now for the finally with -O2/-O1

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Which is interesting in that there is no appreciable difference in the -O2/-O1 vs the -O1/-O1 build. Although I kind of expect different results on a native machine. If anyone else cares to test, I’m going to make available the whole project here. This includes the source and the pre-built binaries.

Unzip it on a win64/win32 machine and it should be somewhat straightforward to build / run. You can alter the makefile and change the primary CC flags from O0 to O1 or O2 if you so wish… just run make and it’ll generate a vax780.exe . Then in the test directory you can bench your exe like this:

$ make
gcc -O0 -c -o VAX/vax_cpu.o VAX/vax_cpu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cpu1.o VAX/vax_cpu1.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_fpa.o VAX/vax_fpa.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
VAX/vax_fpa.c: In function ‘op_cmpfd’:
VAX/vax_fpa.c:210: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:210: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:211: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:211: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘op_cmpg’:
VAX/vax_fpa.c:233: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:233: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:234: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:234: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘vax_fadd’:
VAX/vax_fpa.c:371: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:373: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:386: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘unpackd’:
VAX/vax_fpa.c:525: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:525: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘unpackg’:
VAX/vax_fpa.c:540: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:540: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘norm’:
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:549: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:549: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:557: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘rpackfd’:
VAX/vax_fpa.c:574: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:575: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘rpackg’:
VAX/vax_fpa.c:597: warning: integer constant is too large for ‘long’ type
gcc -O0 -c -o VAX/vax_cis.o VAX/vax_cis.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_octa.o VAX/vax_octa.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cmode.o VAX/vax_cmode.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_mmu.o VAX/vax_mmu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_sys.o VAX/vax_sys.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_syscm.o VAX/vax_syscm.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_stddev.o VAX/vax780_stddev.c -I. -DVM_VAX -DVAX_780 -DU
SE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_sbi.o VAX/vax780_sbi.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_mem.o VAX/vax780_mem.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_uba.o VAX/vax780_uba.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_mba.o VAX/vax780_mba.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_fload.o VAX/vax780_fload.c -I. -DVM_VAX -DVAX_780 -DUSE
_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_syslist.o VAX/vax780_syslist.c -I. -DVM_VAX -DVAX_780 –
DUSE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rl.o PDP11/pdp11_rl.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rq.o PDP11/pdp11_rq.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_ts.o PDP11/pdp11_ts.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_dz.o PDP11/pdp11_dz.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_lp.o PDP11/pdp11_lp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_tq.o PDP11/pdp11_tq.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_xu.o PDP11/pdp11_xu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_ry.o PDP11/pdp11_ry.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_cr.o PDP11/pdp11_cr.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rp.o PDP11/pdp11_rp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_tu.o PDP11/pdp11_tu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_hk.o PDP11/pdp11_hk.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_io_lib.o PDP11/pdp11_io_lib.c -I. -DVM_VAX -DVAX_780 –
DUSE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o scp.o scp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADDR64 -I VAX
-I PDP11
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:476: warning: integer constant is too large for ‘long’ type
scp.c:476: warning: integer constant is too large for ‘long’ type
scp.c:477: warning: integer constant is too large for ‘long’ type
scp.c:477: warning: integer constant is too large for ‘long’ type
scp.c:478: warning: integer constant is too large for ‘long’ type
scp.c:478: warning: integer constant is too large for ‘long’ type
scp.c:479: warning: integer constant is too large for ‘long’ type
scp.c:479: warning: integer constant is too large for ‘long’ type
gcc -O0 -c -o sim_console.o sim_console.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_fio.o sim_fio.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADDR6
4 -I VAX -I PDP11
gcc -O0 -c -o sim_timer.o sim_timer.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_A
DDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_sock.o sim_sock.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o sim_tmxr.o sim_tmxr.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o sim_ether.o sim_ether.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_A
DDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_tape.o sim_tape.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cpu2.o VAX/vax_cpu2.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O1 -c -o VAX/vax_cpu2.o VAX/vax_cpu2.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -o vax780 VAX/vax_cpu.o VAX/vax_cpu1.o VAX/vax_fpa.o VAX/vax_cis.o VAX/vax_
octa.o VAX/vax_cmode.o VAX/vax_mmu.o VAX/vax_sys.o VAX/vax_syscm.o VAX/vax780_st
ddev.o VAX/vax780_sbi.o VAX/vax780_mem.o VAX/vax780_uba.o VAX/vax780_mba.o VAX/v
ax780_fload.o VAX/vax780_syslist.o PDP11/pdp11_rl.o PDP11/pdp11_rq.o PDP11/pdp11
_ts.o PDP11/pdp11_dz.o PDP11/pdp11_lp.o PDP11/pdp11_tq.o PDP11/pdp11_xu.o PDP11/
pdp11_ry.o PDP11/pdp11_cr.o PDP11/pdp11_rp.o PDP11/pdp11_tu.o PDP11/pdp11_hk.o P
DP11/pdp11_io_lib.o scp.o sim_console.o sim_fio.o sim_timer.o sim_sock.o sim_tmx
r.o sim_ether.o sim_tape.o VAX/vax_cpu2.o -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11 -lwinmm -lwsock32

Administrator@JASON-4AC1B1EA0 /usr/src/simh
$ cd test/

Administrator@JASON-4AC1B1EA0 /usr/src/simh/test
$ ../vax780.exe bsd42.ini

VAX780 simulator V3.8-1
loading ra(0,0)boot
Boot
: ra(0,0)vmunix
199488+56036+51360 start 0x11a0
4.2 BSD UNIX #9: Wed Nov 2 16:00:29 PST 1983
real mem = 8384512
avail mem = 7073792
using 102 buffers containing 835584 bytes of memory
mcr0 at tr1
mcr1 at tr2
uba0 at tr3
hk0 at uba0 csr 177440 vec 210, ipl 15
rk0 at hk0 slave 0
rk1 at hk0 slave 1
uda0 at uba0 csr 172150 vec 774, ipl 15
ra0 at uda0 slave 0
ra1 at uda0 slave 1
zs0 at uba0 csr 172520 vec 224, ipl 15
ts0 at zs0 slave 0
dz0 at uba0 csr 160100 vec 300, ipl 15
dz1 at uba0 csr 160110 vec 310, ipl 15
dz2 at uba0 csr 160120 vec 320, ipl 15
dz3 at uba0 csr 160130 vec 330, ipl 15
root on ra0
WARNING: should run interleaved swap with >= 2Mb
Automatic reboot in progress…
Tue Nov 8 03:44:30 PST 1983
Can’t open checklist file: /etc/fstab
Automatic reboot failed… help!
erase ^?, kill ^U, intr ^C
# ./d2;./d2;./d2
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
# sync
# sync
# sync
#
Simulation stopped, PC: 80001629 (BNEQ 80001630)
sim> q
Goodbye

Administrator@JASON-4AC1B1EA0 /usr/src/simh/test
$

The d0,d1,d2 are the dhyrstone benchmark compiled with -O0, -O1, and -O2 respectively. This gives you a chance to observe various optimizations in the GCC 2.7.2.2 for the VAX.

moving hosting companies…..

I’m moving some of my hosting stuff, so there will probably be some bumps in the move… But in the meantime, I think the ‘big’ thing I have, ‘vpsland/install‘ should now be running up again..

Even while I was writing this, I saw someone download f2c!!!

It’s a scary thing to think of my work being used.. Esp regarding Fortran.

Also I’ll have to post some stuff on the x86_64 gcc for Windows… Too bad it won’t run Qemu, but there is 64bit Gfortran!

gcc on the x86_64

Today I thought I’d isolate the portion of simh that crashes 4.X BSD, when SIMH is built with GCC. I managed to find that the “op_ldpctx” and “op_mtpr” procedures of vax_cpu1.c become unstable when built with -O2 flags.

So I extracted the two procedures so I could then built the remainder of SIMH with GCC’s O2 flags. Before going too far, I first went to verify that the results would be consistent with the i386/32bit binaries…

However the results were.. interesting. By default SIMH will build with -O0 flags for the VAX giving the following dhrystone benchmark in 4.3 UWisc BSD:

Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second

24 seconds, not so hot. Then the same thing with -O1 flags…

Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second

18 seconds! pretty good, thats a 33% increase! Now for my hybrid -O2/-O1

Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second

A 5% increase over the -O1 … Nothing too big, but still an INCREASE.

Now for the real fun, I re-ran these bulid factors with the x86_64 compiler.

The -O0/-O0 combination came in at 19 seconds!!

Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second

This is pretty amazing considering everything we do should make it faster… The next thing I tried was the -O1/-O1 combination and go the following:

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

A 12.5 second average! This blew the doors off of the -O2/-O1 on the i386! Naturally I was hoping for at least a 5% increase going to the -O2/-O1 flags on the x86_64, however I got this…

Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

And I ran it numerous times, but the bottom line is that the O2 flags actually produces a SLOWER SIMH on OS X 10.6.4 then -O1 flags… The whole exercise is a tad perplexing to me, to say the least, but unless you bench it, you’ll never know if the compiler is getting confused somewhere, and going to throw a loop. I should also point out that SIMH will not run with full -O2 flags as there is something in the code that breaks under optimization so this very well may not hold true 100% of the time, but at the same point things still have to be tested…

Smith Corona teletype conversion

While converting your typewriter into a teletype may not be all the rage, nor may it be all that … ‘new’ of a project, this one was sent in to me by Stefan (thanks!).

http://upnotnorth.net/2010/10/29/a-new-way-to-interact-with-fiction/

Not only is it in Toronto ( Site 3 coLaboratory ), but yes……

It runs ZORK!

I suppose at some point, somone will have to find the smallest machine with an ethernet & USB port, some rs232/usb cables and a few real terminals for a real ‘micro’ vax…. Or at least, that’s what I envision… 🙂

Notes on fat binaries & OS X

I suppose this also applied to NeXTSTEP/OPENSTEP but I didn’t cross build that much between the m68k and i386…

So let’s start with a 3 way compile…

$ gcc -arch i386 -arch x86_64 -arch ppc7400 dhrystone.c -o dhrystone
$ file dhrystone
dhrystone: Mach-O universal binary with 3 architectures
dhrystone (for architecture i386): Mach-O executable i386
dhrystone (for architecture x86_64): Mach-O 64-bit executable x86_64
dhrystone (for architecture ppc7400): Mach-O executable ppc
$

This builds the single file for all 3 machine types of OS X..

For anyone that cares, this is my quad core box..

$ ./dhrystone
Dhrystone(1.1) time for 500000000 passes = 83
This machine benchmarks at 6024096 dhrystones/second

Ok, now let’s pull out the PowerPC executable, and run that….

$ lipo -extract ppc7400 dhrystone -output dpc
$ file dpc
dpc: Mach-O universal binary with 1 architecture
dpc (for architecture ppc7400): Mach-O executable ppc
$
$ ./dpc
Dhrystone(1.1) time for 500000000 passes = 214
This machine benchmarks at 2336448 dhrystones/second
$

Which isn’t too bad, seeing the emulator runs at 1/3rd speed of the native exe.. It’s no wonder that IBM bought transitive, and shut them down. This kind of technology would make it far too easy for everyone to move away from expensive CPUs…!

Now let’s extract the 32bit i386 exe.

$ lipo -extract i386 dhrystone -output di3

Nothing to really see here.

And for the final part, let’s combine the extracted PPC & i386 executables.

$ lipo di3 dpc -create -output dip
$ file dip
dip: Mach-O universal binary with 2 architectures
dip (for architecture i386): Mach-O executable i386
dip (for architecture ppc7400): Mach-O executable ppc

And there we have it.. Using this I guess I can try to find versions of Qemu that will hopefully cross build on my machine that I can stitch together so that some platforms (PPC) have *SOMETHING* to run at least…..

Or maybe it’ll help someone at least make a stub ‘we are sorry, nothing to see here’ vs an exe error.