SIMH on the Itanium…

So I figured it’d be as good time as any to see how various compilers (mostly Microsoft) stack up on the Itanium. So I built SIMH, and loaded up 4.2 BSD & the old dhrystone benchmark.

And before we get to the numbers, I’m using a 900Mhz Itanium 2, clearly the bottom of the barrel. I ended up loading Windows 2003 server, as XP for the Itanium can’t even install Internet Explorer 7. If you thought a world of Internet Explorer 6 was fun, it’s hell when it is your only browser.

In addition, Visual C++ 2005 was never released on the Itanium, they made it as far as Beta 2, before discontinuing native support. However I like it’s debugger so I’m using the Beta 2 version.

Microsoft (R) C/C++ Optimizing Compiler Version 13.10.2240.8 for IA-64 (from the Feb 2003 platform SDK)

Dhrystone(1.1) time for 500000 passes = 77
This machine benchmarks at 6493 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 76
This machine benchmarks at 6578 dhrystones/second

Microsoft (R) C/C++ Optimizing Compiler Version 14.00.50215.44 for Itanium (Visual Studio 2005 beta)

Dhrystone(1.1) time for 500000 passes = 64
This machine benchmarks at 7812 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 65
This machine benchmarks at 7692 dhrystones/second

Intel(R) C++ IA-64 Compiler for applications running on IA-64, Version 10.1 Build 20080112 Package ID: w_cc_p_10.1.014

Dhrystone(1.1) time for 500000 passes = 53
This machine benchmarks at 9433 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 53
This machine benchmarks at 9433 dhrystones/second

And for some sense of what the emulation is like…

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 10.20.6166 for 80×86 (visual C++ 4.2)

Dhrystone(1.1) time for 500000 passes = 92
This machine benchmarks at 5434 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 91
This machine benchmarks at 5494 dhrystones/second

Not to bad, the Intel compiler is 2x the speed of an i386 executable, while it’s easily 18% + faster then the Microsoft Compiler. I built everything with /Ox flags (Which the Intel compiler honers!).. Executable sizes varied as much as the performance.

1,672,704 Intel C vax780.exe
986,624 Vax780feb2003.exe
495,616 Vax780visualc4.exe
1,232,896 vax780visualstudio2005b2.exe

And of course the larger the executable the faster it ran. No wonder EPIC was driving people insane!

NetBSD 1.2 & the MicroVAX II

So I’ve been working on some instructions to install the first NetBSD I could find that would run on SIMH’s MicroVAX II… Think of it as a lineage continuation from the 4.3 BSD to the Net/2. Anyways the 1.2 install proved to be.. .difficult to say the least, however once it’s installed I found a weird snag, that NetBSD 1.2 doesn’t seem to have curses…. And I though curses was one of those things that made any BSD a BSD.

I haven’t put together an install just yet, I figure I’ll have to download ncurses over, and build with that but I just don’t have the time today, so I’m leaving things where they are for now. Although I did manage to get a bunch more other stuff built, which I’ve posted package tapes on my sourceforge page here. And if anyone is that motivated they can always follow the above instructions, and install their own NetBSD 1.2 VAX, although I’m not holding my breath.. lol

Also I noticed that NetBSD 1.2 doesn’t have any fortran, so I built f2c, and managed to get dungeon running. woo!

Yep, it runs Dungeon!

Oh well it’s all good fun from 1996.

Research Unix v7 on the PDP-11

Boot
: hp(0,0)unix
mem = 2020544
# RESTRICTED RIGHTS: USE, DUPLICATION, OR DISCLOSURE
IS SUBJECT TO RESTRICTIONS STATED IN YOUR CONTRACT WITH
WESTERN ELECTRIC COMPANY, INC.
WED DEC 31 19:02:48 EST 1969

login: root
Password:
You have mail.
#

Wee that was fun. I installed it from tape, so… it was a a bit more challenging. But luckily with all the SIMH & TUHS resources I was able to extract out the setup guide and step through it for the most part.

I’ve documented the steps here, and I even uploaded a tape image to sourceforge.

I’m trying to work out installing 1 BSD, as it was just a collection of userland stuff for Unix v7, however I’m having some issues with floatingpoint, but I have some hopes I can sort my way through it.

I have some hopes of being able to get a 1 BSD & 2 BSD machine up in SIMH… These are the versions that were not full operating system, just supplements to Unix v7 from what I can tell.

SIMH benchmark numbers…

I have to admit, I’m really surprised. In the past Visual C++ had been a clear winner every time I’d checked performance vs Gcc. And the tide seems to have really turned under Windows 7 x64. While not a massive lead, the winner after all these iterations of my simh benchmark was Gcc 4.5.2 for x86_64.

Just in the same fashion here, it seems that on some platforms -O1 is faster then -O2, and you really won’t find out until you run some comparisons.

gcc version 3.4.5 (mingw-vista special r3)

gcc O0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 23
This machine benchmarks at 21739 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 20
This machine benchmarks at 25000 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second

gcc O1
15.33333333333333
Dhrystone(1.1) time for 500000 passes = 16
This machine benchmarks at 31250 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 15
This machine benchmarks at 33333 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 15
This machine benchmarks at 33333 dhrystones/second

gcc 02
12.33333333333333
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

gcc version 4.5.2 20101002 (prerelease) [svn/rev.164902 – mingw-w64/oz] (GCC)

gcc O0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 22
This machine benchmarks at 22727 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second

gcc O1
13.33333333333333
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

gcc O2
12
Dhrystone(1.1) time for 500000 passes = 11
This machine benchmarks at 45454 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64

VC /Od /Bi0
21.33333333333333
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 21
This machine benchmarks at 23809 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 22
This machine benchmarks at 22727 dhrystones/second

VC /O2 /Ob2 /Oi
13
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

VC /O2
13.66666666666667
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second

Vc /Og /Ox
13
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80×86
VC /Og /Ox
13.33333333333333
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

And this is on my laptop, an Intel Core i7 Q 720 running at 1.6Ghz with 4GB of ram & Windows 7 Home Premium.

MinGW-w64

Well after yesterdays x86_64 excitement, I figured I’d take a look in the windows area to see about the 32bit vs 64bit performance on Windows…

I know this isn’t the ‘best’ test as I’m using VMWare Fusion on OS X and running Windows XP x64 sp2 (the 2007 edition, not the 2003 one).

If anyone wants to know how to get a 64bit gcc on windows, this is the best formula I’ve come up with so far.

First download a build of mingw-w64-bin_x86_64 from sourceforge… Right now I’m using a ‘Personal Build’ from sezero_20101003, because… it’s recent, and I would imagine a personal build has some hope of actually working.

I downloaded that, and extracted to the root of the C drive. Then I renamed the mingw64 to mingw.

Next up, I downloaded and installed MSYS 1.0.11, and installed that. I selected the default options, and of course specified my mingw is installed in

C:/mingw

After that, I install the MSYS DTK 1.0, again with default options.

The final part is some kind of editor, I like VIM but it’s… involved to download as the default package ‘vim-7.2-1-msys-1.0.11-bin.tar.lzma’ is in a 7zip compatible archive that needs a bit of tweaking to get a tar file out of. I can provide it here in gzip format, that you can simply extract within the msys command prompt in the /usr directory.

Now with all that done, you should be in business!

$ gcc -v
Using built-in specs.
Target: x86_64-w64-mingw32
Configured with: ../gcc44-svn/configure –host=x86_64-w64-mingw32 –target=x86_6
4-w64-mingw32 –disable-multilib –enable-checking=release –prefix=/mingw64 –w
ith-sysroot=/mingw64 –enable-languages=c,c++,fortran,objc,obj-c++ –enable-libg
omp –with-gmp=/mingw64 –with-mpfr=/mingw64 –disable-nls –disable-win32-regis
try
Thread model: win32
gcc version 4.4.5 20101001 (release) [svn/rev.164871 – mingw-w64/oz] (GCC)

 

But will it run Dungeon?

What is also cool, is that this build of mingw includes gfortran, which is a Fortran 95 compiler with various 2003 & 2008 enhancements. So for the heck of it, I’ve rebuilt the makefile from dungeon-2.5.6 and tweaked the machdep.f to at least call the ITIME function to get the current time. The resulting archive runs pretty well!

Windows XP x64 - dungeon

Yes, it runs! And without a *32 meaning this is a 64bit binary!

 

Onward with SIMH

So going back to SIMH as my benchmark, here is the vax780 with -O0/-O0

Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 20
This machine benchmarks at 25000 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second

Which comparing it to the native x86_64 build is pretty good considering I’m running this in a VM (VMware Fusion!). Now the same test with -O1/-O1

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Which is pretty good! Now for the finally with -O2/-O1

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

Which is interesting in that there is no appreciable difference in the -O2/-O1 vs the -O1/-O1 build. Although I kind of expect different results on a native machine. If anyone else cares to test, I’m going to make available the whole project here. This includes the source and the pre-built binaries.

Unzip it on a win64/win32 machine and it should be somewhat straightforward to build / run. You can alter the makefile and change the primary CC flags from O0 to O1 or O2 if you so wish… just run make and it’ll generate a vax780.exe . Then in the test directory you can bench your exe like this:

$ make
gcc -O0 -c -o VAX/vax_cpu.o VAX/vax_cpu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cpu1.o VAX/vax_cpu1.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_fpa.o VAX/vax_fpa.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
VAX/vax_fpa.c: In function ‘op_cmpfd’:
VAX/vax_fpa.c:210: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:210: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:211: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:211: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘op_cmpg’:
VAX/vax_fpa.c:233: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:233: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:234: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:234: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘vax_fadd’:
VAX/vax_fpa.c:371: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:373: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:386: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘unpackd’:
VAX/vax_fpa.c:525: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:525: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘unpackg’:
VAX/vax_fpa.c:540: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:540: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘norm’:
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:548: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:549: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:549: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:557: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘rpackfd’:
VAX/vax_fpa.c:574: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c:575: warning: integer constant is too large for ‘long’ type
VAX/vax_fpa.c: In function ‘rpackg’:
VAX/vax_fpa.c:597: warning: integer constant is too large for ‘long’ type
gcc -O0 -c -o VAX/vax_cis.o VAX/vax_cis.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_octa.o VAX/vax_octa.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cmode.o VAX/vax_cmode.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_mmu.o VAX/vax_mmu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_sys.o VAX/vax_sys.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_syscm.o VAX/vax_syscm.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_stddev.o VAX/vax780_stddev.c -I. -DVM_VAX -DVAX_780 -DU
SE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_sbi.o VAX/vax780_sbi.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_mem.o VAX/vax780_mem.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_uba.o VAX/vax780_uba.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_mba.o VAX/vax780_mba.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_fload.o VAX/vax780_fload.c -I. -DVM_VAX -DVAX_780 -DUSE
_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax780_syslist.o VAX/vax780_syslist.c -I. -DVM_VAX -DVAX_780 –
DUSE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rl.o PDP11/pdp11_rl.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rq.o PDP11/pdp11_rq.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_ts.o PDP11/pdp11_ts.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_dz.o PDP11/pdp11_dz.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_lp.o PDP11/pdp11_lp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_tq.o PDP11/pdp11_tq.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_xu.o PDP11/pdp11_xu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_ry.o PDP11/pdp11_ry.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_cr.o PDP11/pdp11_cr.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_rp.o PDP11/pdp11_rp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_tu.o PDP11/pdp11_tu.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_hk.o PDP11/pdp11_hk.c -I. -DVM_VAX -DVAX_780 -DUSE_INT
64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o PDP11/pdp11_io_lib.o PDP11/pdp11_io_lib.c -I. -DVM_VAX -DVAX_780 –
DUSE_INT64 -DUSE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o scp.o scp.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADDR64 -I VAX
-I PDP11
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:470: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:471: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:472: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:473: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:474: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:475: warning: integer constant is too large for ‘long’ type
scp.c:476: warning: integer constant is too large for ‘long’ type
scp.c:476: warning: integer constant is too large for ‘long’ type
scp.c:477: warning: integer constant is too large for ‘long’ type
scp.c:477: warning: integer constant is too large for ‘long’ type
scp.c:478: warning: integer constant is too large for ‘long’ type
scp.c:478: warning: integer constant is too large for ‘long’ type
scp.c:479: warning: integer constant is too large for ‘long’ type
scp.c:479: warning: integer constant is too large for ‘long’ type
gcc -O0 -c -o sim_console.o sim_console.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_fio.o sim_fio.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADDR6
4 -I VAX -I PDP11
gcc -O0 -c -o sim_timer.o sim_timer.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_A
DDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_sock.o sim_sock.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o sim_tmxr.o sim_tmxr.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o sim_ether.o sim_ether.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_A
DDR64 -I VAX -I PDP11
gcc -O0 -c -o sim_tape.o sim_tape.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DUSE_ADD
R64 -I VAX -I PDP11
gcc -O0 -c -o VAX/vax_cpu2.o VAX/vax_cpu2.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64 –
DUSE_ADDR64 -I VAX -I PDP11
gcc -O1 -c -o VAX/vax_cpu2.o VAX/vax_cpu2.c -I. -DVM_VAX -DVAX_780 -DUSE_INT64
-DUSE_ADDR64 -I VAX -I PDP11
gcc -o vax780 VAX/vax_cpu.o VAX/vax_cpu1.o VAX/vax_fpa.o VAX/vax_cis.o VAX/vax_
octa.o VAX/vax_cmode.o VAX/vax_mmu.o VAX/vax_sys.o VAX/vax_syscm.o VAX/vax780_st
ddev.o VAX/vax780_sbi.o VAX/vax780_mem.o VAX/vax780_uba.o VAX/vax780_mba.o VAX/v
ax780_fload.o VAX/vax780_syslist.o PDP11/pdp11_rl.o PDP11/pdp11_rq.o PDP11/pdp11
_ts.o PDP11/pdp11_dz.o PDP11/pdp11_lp.o PDP11/pdp11_tq.o PDP11/pdp11_xu.o PDP11/
pdp11_ry.o PDP11/pdp11_cr.o PDP11/pdp11_rp.o PDP11/pdp11_tu.o PDP11/pdp11_hk.o P
DP11/pdp11_io_lib.o scp.o sim_console.o sim_fio.o sim_timer.o sim_sock.o sim_tmx
r.o sim_ether.o sim_tape.o VAX/vax_cpu2.o -I. -DVM_VAX -DVAX_780 -DUSE_INT64 -DU
SE_ADDR64 -I VAX -I PDP11 -lwinmm -lwsock32

Administrator@JASON-4AC1B1EA0 /usr/src/simh
$ cd test/

Administrator@JASON-4AC1B1EA0 /usr/src/simh/test
$ ../vax780.exe bsd42.ini

VAX780 simulator V3.8-1
loading ra(0,0)boot
Boot
: ra(0,0)vmunix
199488+56036+51360 start 0x11a0
4.2 BSD UNIX #9: Wed Nov 2 16:00:29 PST 1983
real mem = 8384512
avail mem = 7073792
using 102 buffers containing 835584 bytes of memory
mcr0 at tr1
mcr1 at tr2
uba0 at tr3
hk0 at uba0 csr 177440 vec 210, ipl 15
rk0 at hk0 slave 0
rk1 at hk0 slave 1
uda0 at uba0 csr 172150 vec 774, ipl 15
ra0 at uda0 slave 0
ra1 at uda0 slave 1
zs0 at uba0 csr 172520 vec 224, ipl 15
ts0 at zs0 slave 0
dz0 at uba0 csr 160100 vec 300, ipl 15
dz1 at uba0 csr 160110 vec 310, ipl 15
dz2 at uba0 csr 160120 vec 320, ipl 15
dz3 at uba0 csr 160130 vec 330, ipl 15
root on ra0
WARNING: should run interleaved swap with >= 2Mb
Automatic reboot in progress…
Tue Nov 8 03:44:30 PST 1983
Can’t open checklist file: /etc/fstab
Automatic reboot failed… help!
erase ^?, kill ^U, intr ^C
# ./d2;./d2;./d2
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
# sync
# sync
# sync
#
Simulation stopped, PC: 80001629 (BNEQ 80001630)
sim> q
Goodbye

Administrator@JASON-4AC1B1EA0 /usr/src/simh/test
$

The d0,d1,d2 are the dhyrstone benchmark compiled with -O0, -O1, and -O2 respectively. This gives you a chance to observe various optimizations in the GCC 2.7.2.2 for the VAX.

gcc on the x86_64

Today I thought I’d isolate the portion of simh that crashes 4.X BSD, when SIMH is built with GCC. I managed to find that the “op_ldpctx” and “op_mtpr” procedures of vax_cpu1.c become unstable when built with -O2 flags.

So I extracted the two procedures so I could then built the remainder of SIMH with GCC’s O2 flags. Before going too far, I first went to verify that the results would be consistent with the i386/32bit binaries…

However the results were.. interesting. By default SIMH will build with -O0 flags for the VAX giving the following dhrystone benchmark in 4.3 UWisc BSD:

Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 24
This machine benchmarks at 20833 dhrystones/second

24 seconds, not so hot. Then the same thing with -O1 flags…

Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second

18 seconds! pretty good, thats a 33% increase! Now for my hybrid -O2/-O1

Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second

A 5% increase over the -O1 … Nothing too big, but still an INCREASE.

Now for the real fun, I re-ran these bulid factors with the x86_64 compiler.

The -O0/-O0 combination came in at 19 seconds!!

Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second

This is pretty amazing considering everything we do should make it faster… The next thing I tried was the -O1/-O1 combination and go the following:

Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second

A 12.5 second average! This blew the doors off of the -O2/-O1 on the i386! Naturally I was hoping for at least a 5% increase going to the -O2/-O1 flags on the x86_64, however I got this…

Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second

And I ran it numerous times, but the bottom line is that the O2 flags actually produces a SLOWER SIMH on OS X 10.6.4 then -O1 flags… The whole exercise is a tad perplexing to me, to say the least, but unless you bench it, you’ll never know if the compiler is getting confused somewhere, and going to throw a loop. I should also point out that SIMH will not run with full -O2 flags as there is something in the code that breaks under optimization so this very well may not hold true 100% of the time, but at the same point things still have to be tested…