Pinball on 64-bit Alpha AXP Windows NT

This is a guest post from Yufeng Gao

One of the most popular OS built-in games is no doubt Pinball, known by its full name 3D Pinball for Windows – Space Cadet. It started out as Full Tilt! Pinball, developed by Cinematronics and published by Maxis. It offered 3 tables, and one of them, Space Cadet, was licensed to Microsoft to be included in Microsoft Plus! 95 and, later, built into the Windows operating system.

Windows XP was the last version of Windows to include Pinball, and Raymond Chen explained why it didn’t make it to Windows Vista on his blog. The reason was it had a collision detector bug when it was compiled for 64-bit Windows, which caused the ball to pass through various objects – falling off the screen through the plunger instead of being launched, for instance. The bug rendered the game unplayable, and Raymond and his colleague were unable to find a fix in a reasonable amount of time, so he removed it. At least that’s the story we were told, for about a decade.

In 2021, NCommander launched a series of investigations to challenge that, testing Pinball on various 64-bit (IA-64 and AMD64) builds of Windows XP and pre-release Vista. He found that the 64-bit versions of Pinball were all highly playable, with only very minor glitches, and speculated that the reason for its removal was that the UI did not fit into the Windows Vista design.

Not long after NCommander published his video, Raymond followed up with a post that filled in some gaps in the story and shed more light on the bug. He said it was the 64-bit Alpha AXP version of Pinball that had the extremely bad collision detection bug. This claim had been unverifiable for the past 5 years, for the following reasons:

  • No 64-bit Windows was ever released for the Alpha AXP – Compaq killed Windows NT support before NT was ported to 64-bit
  • One 64-bit Alpha AXP NT build was leaked in 2023, but the included Pinball does not work, as it segfaults immediately upon running

I’ve had an interest in the DEC Alpha for quite some time now, mainly out of my love for DEC architectures and my love for UNIX. VAX is the direct successor of PDP-11, and Alpha is the direct successor of VAX. Earlier, some Alpha emulation breakthroughs dropped, and I was pinged by a few friends that NT 4.0 could now run on a fork of the ES40 emulator, as well as on QEMU. I never thought Alpha NT would ever run under emulation, because unlike the familiar Tru64, Linux and the BSDs, NT uses its own custom PALcode and depends on ARC (Advanced RISC Computing) instead of SRM. Of course, people noted that the emulators couldn’t run the holy grail of Alpha NT – Windows (XP?) build 2210, because its kernel would panic with a memory management error in QEMU, or wouldn’t detect the keyboard and bug out in ES40. A few trips to hell in the symbol-less NT kernel and a few MMU emulation fixes later, I was able to patch up both QEMU and ES40 to boot that only surviving 64-bit build of Alpha NT.

After torturing my brain debugging a symbol-less NT kernel without a kernel debugger, I thought I’d give fixing Pinball a go, to make things worthwhile. One of the benefits of debugging a userland process is that, while there’s still no debugger, there is Dr. Watson, which takes core dumps and performs simple post-mortems. Something is better than nothing, as people would say.

Running Pinball gives the classic crash symptom immediately, with no graphics drawn:

Dr. Watson concludes that it died of a segfault:

It gave a nice dump of registers at the time of the fault:

State Dump for Thread Id 0x124

  v0=01002930 00000000   t0=00000000 00360000   t1=00000000 00000001
  t2=00000000 00360000   t3=00000000 00000000   t4=00000000 00000000
  t5=00000000 0000011c   t6=000003ff fff8f868   t7=00000000 00303030
  s0=000003ff fff8fac0   s1=01002930 00000000   s2=000003ff fff8fad8
  s3=00000000 00000000   s4=00000000 0106f2a8   s5=00000000 01000000
  fp=00000000 00000010   a0=01002930 00000000   a1=00000000 00000000
  a2=000003ff fff8fad8   a3=00000000 30010000   a4=00000000 69e17610
  a5=00000000 69e0a360   t8=000003ff fff8f868   t9=00000000 00000000
 t10=00000000 00300000  t11=00000000 00000002   ra=00000000 69e9d5c0
 t12=00000000 6a264710   at=ffffffff fffffe10   gp=00000000 00000000
  sp=000003ff fff8fa50 zero=00000000 00000000 fpcr=08000000 00000000
SoftFpcr=00000000 00000000  fir=6a264710
 psr=00000003
mode=1 ie=1 irql=0 

Some disassembly around the faulting instruction:

function: Otsstrlen
FAULT ->00000000'6a264710: 2f700000 ldq_u t12,0(a0)
        00000000'6a264714: 239fffff lda at,-1(zero)
        00000000'6a264718: 4b90065c mskql at,a0,at
        00000000'6a26471c: 4600f000 and a0,#7,v0
        00000000'6a264720: 477c041b bis t12,at,t12
        00000000'6a264724: 43fb01fb cmpbge zero,t12,t12
        00000000'6a264728: 43e00520 subq zero,v0,v0
        00000000'6a26472c: f7600005 bne t12,00000000'6a264744 Otsstrlen+00000034
        00000000'6a264730: 2f700008 ldq_u t12,8(a0)
        00000000'6a264734: 42011410 addq a0,#8,a0
        00000000'6a264738: 40011400 addq v0,#8,v0
        00000000'6a26473c: 43fb01fb cmpbge zero,t12,t12

And a very useful stack backtrace:

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1  Param#2  Param#3  Param#4  Function Name
000003FFFFF8FA50 0000000069E9D5BC 0100293000000000 0000000000000000 000003FFFFF8FAD8 0000000030010000 !Otsstrlen 
000003FFFFF8FA50 0000000069E9DB64 0100293000000000 0000000000000000 000003FFFFF8FAD8 0000000030010000 !PostThreadMessageA 
000003FFFFF8FA90 0000000069E9C000 0100293000000000 0000000000000000 000003FFFFF8FAD8 0000000030010000 !SetClassLongA 
000003FFFFF8FB80 000000000100F914 000003FFFFF8FC58 0000000000000000 000003FFFFF8FAD8 0000000030010000 !RegisterClassA 
000003FFFFF8FBF0 0000000001012A1C 0000000001000000 0000000001002DC8 000003FFFFF8FAD8 0000000030010000 !<nosymbols> 
000003FFFFF8FCA0 0000000001064B0C 0000000001000000 0000000001002DC8 000003FFFFF8FAD8 0000000030010000 !<nosymbols> 
000003FFFFF8FED0 0000000068948C50 0000000001000000 0000000001002DC8 000003FFFFF8FAD8 0000000030010000 !<nosymbols> 
000003FFFFF8FFC0 0000000000000000 0000000001064800 0000000001002DC8 000003FFFFF8FAD8 0000000030010000 !BaseProcessStart 

Ok, so it died inside RegisterClassA, a critical Win32 API function. That API function couldn’t have been the culprit, because if it were bugged, no GUI Win32 program would run at all. This means the only possible source of the error is its sole argument – a pointer to a WNDCLASSA struct. Needless to say, the pointer itself was valid, otherwise the API would’ve detected the invalid argument, or the segfault would’ve happened a lot sooner.

From the stack trace, the return address of RegisterClassA was 0x100F914, inside the function splash_screen. A quick disassembly of the instructions preceding that address shows a WNDCLASSA structure being built with the following layout:

00000000 u32 style         = 0
00000004 u64 lpfnWndProc   = splash_message_handler (0x100FE40)
0000000C u32 cbClsExtra    = 0
00000010 u32 cbWndExtra    = 8
00000014 u64 hInstance     = *0x106AE30
0000001C u64 hIcon         = NULL
00000024 u64 hCursor       = LoadCursorA(NULL, IDC_ARROW)
0000002C u64 hbrBackground = NULL
00000034 u64 lpszMenuName  = "" (0x1002710)
0000003C u64 lpszClassName = "3DPB_SPLASH_CLASS" (0x1002930)

Right off the bat, I noticed something wrong – the field alignment. It is a general requirement that fields be aligned to their size, as in 8-bit fields should be byte-aligned, 16-bit fields should be 16-bit (2-byte) aligned, 32-bit fields should be 32-bit (4-byte) aligned, and 64-bit fields should be 64-bit (8-byte) aligned. If you look at the offsets of the fields above, the 32-bit ones are indeed 4-byte aligned, but the 64-bit ones are not. At the start, we have a 32-bit style field followed by a 64-bit lpfnWndProc, and to satisfy the alignment requirements, a 4-byte padding should be inserted between style and lpfnWndProc to ensure that lpfnWndProc starts on an 8-byte boundary. RegisterClassA was expecting this padding, but Pinball lacked it, so it read data from the wrong offset and crashed.

To fix this, I simply bumped the offset of each field after style up by 4 bytes.

 00000000 u32 style         = 0
-00000004 u64 lpfnWndProc   = splash_message_handler (0x100FE40)
+00000008 u64 lpfnWndProc   = splash_message_handler (0x100FE40)
-0000000C u32 cbClsExtra    = 0
+00000010 u32 cbClsExtra    = 0
-00000010 u32 cbWndExtra    = 8
+00000014 u32 cbWndExtra    = 8
-00000014 u64 hInstance     = *0x106AE30
+00000018 u64 hInstance     = *0x106AE30
-0000001C u64 hIcon         = NULL
+00000020 u64 hIcon         = NULL
-00000024 u64 hCursor       = LoadCursorA(NULL, IDC_ARROW)
+00000028 u64 hCursor       = LoadCursorA(NULL, IDC_ARROW)
-0000002C u64 hbrBackground = NULL
+00000030 u64 hbrBackground = NULL
-00000034 u64 lpszMenuName  = "" (0x1002710)
+00000038 u64 lpszMenuName  = "" (0x1002710)
-0000003C u64 lpszClassName = "3DPB_SPLASH_CLASS" (0x1002930)
+00000040 u64 lpszClassName = "3DPB_SPLASH_CLASS" (0x1002930)

But that was not sufficient – Pinball calls RegisterClassA in 4 different places – Sound_Initsplash_screenWinMain and WaveMixStartup. I’d already patched the one in splash_screen, so I started going through the rest one by one.

The ones in Sound_Init and WinMain were identical to the one in splash_screen, but for some strange reason the one in WaveMixStartup already had the correct alignment:

00000000 u32 style         = 0
00000004 u32 <unused>      = <undefined>
00000008 u64 lpfnWndProc   = WndProc (0x105CFA0)
00000010 u32 cbClsExtra    = 0
00000014 u32 cbWndExtra    = 0
00000018 u64 hInstance     = *0x106B818
00000020 u64 hIcon         = NULL
00000028 u64 hCursor       = LoadCursorA(NULL, IDC_ARROW)
00000030 u64 hbrBackground = GetStockObject(LTGRAY_BRUSH)
00000038 u64 lpszMenuName  = NULL
00000040 u64 lpszClassName = "WavMix32" (0x10050B0)

I can’t think of why the same struct would be aligned differently within the same binary, unless they came from different objects compiled with different flags or something.

Anyway, with the WNDCLASSA struct alignment fixed in 3 of the 4 places, I ran Pinball again. This time it created the fullscreen window and attempted to draw the splash screen before dying of another segfault:

Crash log shows that the segfault happened deep in the Win32 audio system, while calling auxSetVolume:

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1  Param#2  Param#3  Param#4  Function Name
000003FFFFF8E9C0 0000000050306E84 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8E9E0 00000000503034B8 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8EA10 0000000050304B60 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8EA90 0000000050305B8C 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8EB20 000000000001AF78 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8EB60 0000000000025AFC 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !auxSetVolume 
000003FFFFF8EBD0 0000000000025F98 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !mixerSetControlDetails 
000003FFFFF8EC80 0000000000027214 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !mixerSetControlDetails 
000003FFFFF8ED30 0000000000030644 00000000FFE5D420 0000000000000000 000003FFFFE5D420 0000000000000000 !mciSendCommandW 
[...]
000003FFFFF8FC60 0000000001012AB4 0000000000000000 000003FFFFE5DE00 000003FFFFE5D420 0000000000000000 !CreateWindowExA 
000003FFFFF8FCA0 0000000001064B0C 0000000000000000 000003FFFFE5DE00 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8FED0 0000000068948C50 0000000000000000 000003FFFFE5DE00 000003FFFFE5D420 0000000000000000 !<nosymbols> 
000003FFFFF8FFC0 0000000000000000 0000000001064800 000003FFFFE5DE00 000003FFFFE5D420 0000000000000000 !BaseProcessStart

The fault happened while trying to dereference a pointer in the register a0 (r16):

function: <nosymbols>
        00000000'50305658: 00000000 halt
        00000000'5030565c: 00000000 halt
        00000000'50305660: 23deffe0 lda sp,-20(sp)
        00000000'50305664: b53e0000 stq s0,0(sp)
        00000000'50305668: b55e0008 stq s1,8(sp)
        00000000'5030566c: b57e0010 stq s2,10(sp)
        00000000'50305670: b75e0018 stq ra,18(sp)
        00000000'50305674: 47f00409 bis zero,a0,s0
        00000000'50305678: 47f1040a bis zero,a1,s1
        00000000'5030567c: 47ff040b bis zero,zero,s2
FAULT ->00000000'50305680: a2100128 ldl a0,128(a0)
        00000000'50305684: 20500001 lda t1,1(a0)
        00000000'50305688: e440001b beq t1,00000000'503056f8 00000000'503056f8
        00000000'5030568c: d35ff6f8 bsr ra,00000000'50303270 00000000'50303270
        00000000'50305690: e4000019 beq v0,00000000'503056f8 00000000'503056f8
        00000000'50305694: 47e00411 bis zero,v0,a1
        00000000'50305698: 454b0801 xor s1,s2,t0
        00000000'5030569c: e4200005 beq t0,00000000'503056b4 00000000'503056b4
        00000000'503056a0: a2090128 ldl a0,128(s0)
        00000000'503056a4: d35ff722 bsr ra,00000000'50303330 00000000'50303330
        00000000'503056a8: 4160300b addl s2,#1,s2
        00000000'503056ac: 47e00411 bis zero,v0,a1

The register dump shows the value of a0 at the time of the crash:

State Dump for Thread Id 0x150

  v0=000003ff ffe5de00   t0=00000000 00000000   t1=00000000 00000058
  t2=00000000 50306150   t3=00000000 0000015e   t4=00000000 00000001
  t5=00000000 00000001   t6=00000000 50300000   t7=00000000 00ed39fb
  s0=00000000 ffe5d420   s1=00000000 00000000   s2=00000000 00000000
  s3=00000000 00000000   s4=00000000 00000001   s5=00000000 50305a10
  fp=00000000 000123b8   a0=00000000 ffe5d420   a1=00000000 00000000
  a2=000003ff ffe5d420   a3=00000000 00000000   a4=00000000 00000000
  a5=00000000 cc5a4dbc   t8=00000001 00000000   t9=00000000 00000612
 t10=d1b71758 e219652c  t11=00000000 00000612   ra=00000000 50306e88
 t12=00000000 00000000   at=00000000 00010000   gp=00000000 00000000
  sp=000003ff fff8e9c0 zero=00000000 00000000 fpcr=89000000 00000000
SoftFpcr=00000000 00000000  fir=50305680
 psr=00000003
mode=1 ie=1 irql=0 

Indeed, it was an invalid pointer! As you can see, it’s identical to the pointer in a2, but with the entire top 32 bits zeroed. It must’ve been truncated by a bug somewhere, either in the audio subsystem or in Pinball itself.

I spent some time and pinned down the DLL responsible for the fault – mciseq.dll, and did some tracing. The truncation of a0 happened when a2 was moved into a0:

50306150    ZAPNOT  a2,#15,a0

ZAPNOT is an interesting instruction – it takes a source register, a bitmask and a destination register, and it “zaps” (zeros) the bytes whose corresponding bit in the bitmask is 0. In this case, the bitmask is 15, which is 00001111 in binary. From this we can work out that the ZAPNOT instruction at 0x50306150 zeros the upper 4 bytes of a2, when it is copied into a0. This perfectly explains why, at the time of the fault, a0 contained a truncated version of the pointer in a2.

Of course, 0x50306150 was not the only place where it truncated 64-bit pointers, I found 6 truncations of the exact same type in mciseq.dll. I have not the slightest clue why it decided to truncate pointers. If I had to guess, maybe they had pointer → integer → pointer casts for whatever reason, and that integer type was 32-bit. With all 6 truncations patched out, we have some Pinball for ourselves:

Here’s proof that Pinball is indeed running on a 64-bit build of Alpha NT:

To make Pinball work on your NT build 2210 install, replace %ProgramFiles%\Windows NT\Pinball\pinball.exe and %windir%\system32\mciseq.dll with the following:

You could also patch the installation files and burn them to a new CD if you want Pinball to work out of the box on fresh installs – simply copy these files to the AXP64 directory of the install disc:

The Bug

Now I’m going to disappoint you with the fact that I did not find the collision detector bug Raymond talked about. With the struct alignment and pointer truncation issues patched, the game now works flawlessly. Ok, I’m not sure if it’s actually flawless, but I never saw any glitches in the few games I played. At the very least, the ball does not fall through the plunger, can be launched and bounces around the table just fine.

Below are the 2 reasons I could think of for not seeing the collision detector bug:

  • The bug was introduced after build 2210. Build 2210 has Pinball installed by default and predates Windows XP by almost a year and a half, so it almost certainly predates Raymond removing it. In this build, Pinball doesn’t even run by default, so there’s no way they could’ve tested it and seen the bug. They probably only started testing Pinball later, after they fixed the struct alignment and pointer truncation issues.
  • The bug only manifests in free/release builds, not in checked/debug builds. Maybe it only shows up when the code is compiled with the more aggressive optimisation used by release builds – something that happens quite often when code has undefined behaviour or the compiler has bugs. This is less plausible, however, as I’m sure Raymond would’ve used debug builds when he attempted to debug it, and discovered any differences between debug and retail builds.

Of course, only Raymond himself could shed more light on this topic. It was fun (read: painful) debugging Pinball, as well as the NT kernel to fix the emulators – too much fun (read: pain) that I will never do it again.


Some trivia about me and pinball:

I spent a fair chunk of my kindergarten and pre-school days playing the various games my dad installed on our Windows XP home computer, however, there was one game that I never quite figured out how to play – Pinball. It came bundled with the OS, and the splash screen scared me every time I tried to open it up.

The flipper looked like a pistol to my 3-year-old self, and the overall darkness of the splash screen just injected fear into me. I would open the game, close my eyes, count to 20, then open my eyes again, to skip past the splash screen.

The first time I actually played a full game of pinball was on the first day of this year, when a friend of mine took me to an arcade. After playing pinball in real life, the Pinball game finally started to make sense.

Yet another GCC 1.40 *SOME ASSEBMLY REQUIRED

phoon

Oh sure I’ve done this ages ago, getting GCC 1.40 to compile with old Microsoft C compilers, and then target Win32, it’s not that ‘special’. But I thought I’d try to get them to build with MASM so I could just distribute this with an assembler. Spelling out the joke of some assembly required.

Although I wasn’t going to target/host OS/2 I was ideally going straight to Win32, the MASM 6.11 assembler couldn’t assemble the MSVC 1.0 / MSC/386 8.0 compiler’s assembly output, I needed to use the MASM 7 from Visual C++ 2003; namely:

Microsoft (R) Macro Assembler Version 7.10.3077 Copyright (C) Microsoft Corporation. All rights reserved.

MASM 6.11 was having issues with pushing OFFSET’s ie:

push OFFSET _obstack

when they were defined as:

COMM _obstack:BYTE:024H

Chat GPT to the rescue knowing that later MASM’s will just handle it just fine. And it was right! I know AI gets a bad rep, but surprisingly (or not when you think about what it’s been trained on), it’s got some great insight to some old things like seemingly common software tools, and old environments.

I didn’t bother trying to use Microsoft C/386 6.0 & MASM386 5.1 to see if it’ll handle CC1, as that seems to be a bit extreme. and I wanted this to run on semi modern Win32 stuff. More so that there isn’t a 64bit SMP aware OS/2 with a modern web browser. Kind of sad to be honese, but it’s 2026, and here we are.

I as always stick to the Xenix GAS port that outputs 386 OMF objects that earlier linker’s can happily auto-convert to coff and use on Win32. One day I feel I should ask why they were cross compiling NT/i386 from OS/2 1.21 instead of using Xenix?! Must have been some fundamental NTOS/2 thing I suppose.

I guess a refresher for anyone comming in out of the cold here’s a really poorly done block diagram of what goes on when a traditional (GCC) compiler runs. Explaniation is here: so it turns out GCC could have been available on Windows NT the entire time.

GCC program flow

Long story there was that the Xenix GAS emits an ancient 386 OMF format that for unknown reaons the older Microsoft Linkers happily accept and auto convert into COFF, the file format of the future (Future being 1988). I guess for better. or worse we never got NT/ELF. Oh and speaking of further weird, the IBM version of their LINK386 doesn’t like the Xenix 386 OMF. Bummer.

One thing I found out is that the MASM v7 doesn’t output COFF by default, rather it’s 386 OMF! you need to add the /coff flag to force it to be more Win32 friendly. Kind of unexpected behaviour.

I tried to make this simple as, clone the repo and run ‘build.cmd’ it’ll link up GCC and then build the test programs, and clean up after itself.

https://github.com/neozeed/gcc140-masm

I’d tried to emit assembly for the Xenix GAS, but for some reason it’s struggling with floating point. I’m not sure, I tried using chat gpt to debug but it get’s confused on how this whole bizzare tool chain is working. I guess I can’t blame it.

Sorry it’s been a while, been feeling ‘life’ lately. I had some i7 project as a kicker for a retro Windows 10 build thing to do but watchign the RAM crissis unfold and well life… I just got feeling like it’s so irrelevant who’d care. That and it’s insane watching $1.11 worth of DDR3 RAM now selling for $30++ …. and more and more chip manufacturers are exiting. So it felt like maybe go back and do more with less. Even a low end machine can assemble this in seconds!

Thunking for fun & a lack of profit

So, with a renewed interest in OS/2 betas, I’d been getting stuff into the direction of doing some full screen video. I’d copied and pasted stuff before and gotten QuakeWorld running, and I was looking forward to this challenge. The whole thing hinges on the VIO calls in OS/2 like VioScrLock, VioGetPhysBuf, VioScrUnLock etc etc. I found a nifty sample program Q59837 which shows how to map into the MDA card’s text RAM and clear it.

It’s a 16bit program, but first I got it to run on EMX with just a few minor changes, like removing far pointers. Great. But I wanted to build it with my cl386 experiments and that went off the edge. First there are some very slick macros, and Microsoft C just can’t deal with them. Fine I’ll use GCC. Then I had to get emximpl working so I could build an import library for VIO calls. I exported the assembly from GCC, and mangled it enough to where I could link it with the old Microsoft linker, and things were looking good! I could clear the video buffer on OS/2 2.00 GA.

Now why was it working? What is a THUNK? Well it turns out in the early OS/2 2.0 development, they were going to cut loose all the funky text mode video, keyboard & mouse support and go all in on the graphical Presentation Manager.

Presentation Manager from OS/2 6.605

Instead, they were going to leave that old stuff in the past, and 16bit only for keeping some backwards compatibility. And the only way a 32bit program can use those old 16bit API’s for video/keyboard/mouse (etc) is to call from 32bit mode into 16bit mode, then copy that data out of 16bit mode into 32bit mode. This round trip is called thunking, and well this sets up where it all goes wrong.

Then I tried one of the earlier PM looking betas 6.605, and quickly it crashed!

SYS2070:

Well this was weird. Obviously, I wanted to display help

Explanation:

This ended up being a long winded way of saying that there is missing calls from DOSCALL1.DLL. Looking through all the EMX thunking code, I came to the low level assembly, that actually implemented the thunking.

EXTRN   DosFlatToSel:PROC
EXTRN   DosSelToFlat:PROC

After looking at the doscalls import library, sure enough they just don’t exist. I did the most unspeakable thing and looked at the online help for guidance:

No VIO

So it turns out that in the early beta phase, there was no support for any of the 16bit IO from 32bit mode. There was no thunking at all. You were actually expected to use Presentation Manager.

YUCK

For anyone crazy enough to care, I uploaded this onto github Q59837-mono

It did work on the GA however so I guess I’m still on track there.

Targeting OS/2 with Visual Studio 2003

Sydney’s idea of what Visual C++ for OS/2 should look like

No, it’s not a typo.

This is a long-winded post, but the short version is that I found a working combination to get the C compiler from Visual Studio 2003 targeting OS/2.

Once I’d learned how C compilers are a collection of programs working in concert, I’d always wanted to force Microsoft C to work in that fashion, however it is born to be a compiler that integrates everything but linking. There has been a “/Fa” or output assembly option, but I’ve never gotten it to do anything useful. I’m not that much into assembly but it seemed insurmountable.

But for some reason this time things were different.

This time I used:

Microsoft (R) Macro Assembler Version 6.11

After the great divorce and the rise of Windows NT, Microsoft had shifted from the OMF format to COFF. However somewhere buried in their old tools it still supports it, namely MASM. For example, if I try to run LINK386 (the OS/2 Linker) against output from Visual C++ 2003 I get his

abandon.obj :  fatal error L1101: invalid object module
 Object file offset: 1  Record type: 4c

However if I output to assembly and then have MASM assemble that, and try the linker, I’m bombarded with errors like this:

warp.obj(warp.asm) :  error L2025: __real@4059000000000000 : symbol defined more than once
warp.obj(warp.asm) :  error L2029: '__ftol2' : unresolved external

If I was smart I’d have given up, there is pages and pages of this stuff. But I’m not smart, so instead I decided to something different, and use SED, the stream editor, and try to patch out the errors.

The ftol2 call is for newer CPU’s and any OS/2 library won’t have it. But instead of binary editing symbols we can replace the ftol2 with ftol with this simple line:

 sed -e 's/_ftol2/_ftol/g'

For some reason Visual C++ likes to make all it’s reals “public” meaning there can only be one, but yet there is so many. Why not comment them all out?

sed -e 's/PUBLIC\t__real@/;PUBLIC\t__real@/g'

And there are various other annoying things, but again they can be all patched out. Just as the older Windows 1991 Pre-release compilers also have weird syntax that MASM doesn’t understand.

astro.asm(59): error A2138: invalid data initializer

which goes into how Microsoft C used to initialize floating point constants:

CONST   SEGMENT  DWORD USE32 PUBLIC 'CONST'
$T20000         DQ      0a0ce51293ee845c8r    ; 1.157407407407407E-05
$T20001         DQ      0bffffd3441429ec5r    ; 2440587.499999667
CONST      ENDS

while I found that the C compiler in Xenix 386 initialises them like this:

$T20000         DQ      0a0ce51293ee845c8h    ; 1.157407407407407E-05
$T20001         DQ      0bffffd3441429ec5h    ; 2440587.499999667

This one was a little hard for me as I’m not a sed expert, but I did figure out how to mark the section, and then to replace it

sed -e "s/DQ\t[0-9a-f]r/&XMMMMMMX/g" $.a1 |  sed -e "s/rXMMMMMMX/H/g"

And so on. At the moment my ‘mangle’ script is now this:

.c.obj:
        $(CC) $(INC) $(OPT) $(DEBUG) /c /Fa$*.a $*.c
        wsl sed -e 's/FLAT://g' $*.a > $*.a1
        wsl sed -e "s/DQ\t[0-9a-f]*r/&XMMMMMMX/g" $*.a1 \
        | wsl sed -e "s/rXMMMMMMX/H/g" \
        | wsl sed -e 's/call \t\[/call DWORD PTR\[/g' \
        | wsl sed -e 's/PUBLIC\t__real@/;PUBLIC\t__real@/g' \
        | wsl sed -e 's/_ftol2/_ftol/g' > $*.asm
        ml /c $*.asm
        del $*.a $*.a1 $*.asm

This allows me to plug it into a Makefile, so I only have to edit it in one place.

Not surprisingly, this allows the LINK from Visual C++ 1.0 to link the MASM generated object files and get a native Win32 executable. Even from the oldest compiler I have from the Microsoft OS/2 2.00 Beta 2 SDK from 1989!

But now that we have the C compilers being able to output to something we can edit and force into a Win32, there is a few more things and suddenly:

        C:\cl386-research\bin\13.10.6030\cl386 /u /w /G3 /O  /c /Faphoon.a phoon.c
C:\cl386-research\bin\13.10.6030\CL386.EXE: warning: invoking C:\cl386-research\bin\13.10.6030\CL.EXE
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.6030 for 80x86
Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.

phoon.c
        wsl sed -e 's/FLAT://g' phoon.a > phoon.a1
        wsl sed -e "s/DQ\t[0-9a-f]*r/&XMMMMMMX/g" phoon.a1  | wsl sed -e "s/rXMMMMMMX/H/g"  | wsl sed -e 's/call \t\[/call DWORD PTR\[/g'  | wsl sed -e 's/PUBLIC\t__real@/;PUBLIC\t__real@/g'  | wsl sed -e 's/_ftol2/_ftol/g' > phoon.asm
        ml /c phoon.asm
Microsoft (R) Macro Assembler Version 6.11
Copyright (C) Microsoft Corp 1981-1993.  All rights reserved.

 Assembling: phoon.asm
        del phoon.a phoon.a1 phoon.asm
        msdos286 run286 C:\cl386-research\bin\ddk12\LINK386.EXE @phoon.lnk

Operating System/2 Linear Executable Linker
Version 2.01.012 Nov 02 1993
Copyright (C) IBM Corporation 1988-1993.
Copyright (C) Microsoft Corp. 1988-1993.
 All rights reserved.

Object Modules [.obj]: astro.obj date_p.obj phoon.obj
Run File [astro.exe]: phoon2 /NOE /NOI /NOD:OLDNAMES
List File [nul.map]: nul.map
Libraries [.lib]: ..\..\lib2\libc.lib +
Libraries [.lib]: ..\..\lib2\os2386.lib
Definitions File [nul.def]: nul.def;
LINK386 :  warning L4071: application type not specified; assuming WINDOWCOMPAT

I know it’s a bit of a word salad, but the key thing here is that using Visual C++ 2003’s compiler (version 13.10.6030), and outputting to assembly that we can edit, we can then use MASM to build objects that surprisingly LINK386 version 2.01.012 will link with. I suspect this has to do with device drivers, and probably the majority of the OS/2 operating system.

Anways, we’ve done the incredible, using the same object files, we made both a Win32 application, and an OS/2 application!

phoon-13.10.6030.exe: PE32 executable (console) Intel 80386, for MS Windows
phoon2.exe: MS-DOS executable, LX for OS/2 (console) i80386
Phoon compiled by Visual C++ 2003 on OS/2 2.00

Incidentally Happy CNY!

Obviously, this is VERY cool stuff.

I know the next question is do we have to rely on a 16bit linker? How about Watcom?

C:\cl386-research\proj\trek>wlink @trek.wlk
WATCOM Linker Version 10.0
Copyright by WATCOM International Corp. 1985, 1994. All rights reserved.
WATCOM is a trademark of WATCOM International Corp.
loading object files
searching libraries
Warning(1008): cannot open LIBC.lib : No such file or directory
Warning(1008): cannot open OLDNAMES.lib : No such file or directory
creating an OS/2 32-bit executable

Ignore the warnings and YES we can Link from something much newer & 32bit! In this example I linked the old TREK game, also built with Visual C++ 2003. The response file looks lke:

SYS os2v2
NAME trek2w
FILE abandon.obj,attack.obj,autover.obj,capture.obj,cgetc.obj,checkcond.obj
FILE check_out.obj,compkl.obj,computer.obj,damage.obj,damaged.obj,dcrept.obj
FILE destruct.obj,dock.obj,dumpgame.obj,dumpme.obj,dumpssradio.obj,events.obj
FILE externs.obj,getcodi.obj,getpar.obj,help.obj,impulse.obj,initquad.obj
FILE kill.obj,klmove.obj,lose.obj,lrscan.obj,main.OBJ,move.obj
FILE nova.obj,nullsleep.obj,out.obj,phaser.obj,play.obj,ram.obj
FILE ranf.obj,rest.obj,schedule.obj,score.obj,setup.obj,setwarp.obj
FILE shield.obj,snova.obj,srscan.obj,systemname.obj,torped.obj,utility.obj
FILE visual.obj,warp.obj,win.obj
LIBR ..\..\lib2\LIBC.LIB
LIBR ..\..\lib2\OS2386.LIB

It’s probably needing additional stack space, maybe some other stuff, or resources, maybe how to flag it’s windowing compatible.

TREK built by Visual C++ 2003, and Linked using Watcom C/C++ 10.0

How do I get started, if I dare?! First download and unpack cl386-research-v2. Ideally on the root of your C: drive, because why not?

run the ‘env’ command to set your environment up. Its pretty complicated but in the proj directly there is currently:

*NOTE that I do use SED scripts, I have it set to use Linux in the WSL package.
I tried some Win32 sed but it didn’t work. So you need WSL or a working sed!

you can then go into each directory and run

make os2

and it’ll compile populate a floppy and launch the emulator

Its all good fun.

Read the Makefiles to configure a compiler, how to run it, and if you need to mangle the assembly. The 32bit new stuff needs to be mangled, the older stuff almost always works with just compile.

# Version 6.00.054      1989
# https://archive.org/details/os-2-cd-rom_202401
PLATFORM = ddksrc

In this case it’ll select the platform from the ‘ddksdk’ release. The next is if the compiler is OS/2 based or native win32. Basically 73g / windows 95 & below are native Win32.

In the above example we comment out the dos extended cross

# dos exteded cross
CC =  $(EMU) $(DOSX) $(CL386ROOT)\$(PLATFORM)\cl386
# native CC
# CC =  $(CL386ROOT)\$(PLATFORM)\cl386

Next is the mangle strategy. In this case it’s an ancient OS/2 (like) compile so
try un commenting the ‘just compile’ line

# must include ONLY ONE strategey..
# for OS/2 it must have been assembled my MASM 6.11

include ..\-justcompile.mak
#include ..\-mangleassembly.mak
#include ..\-plainassembly.mak

save the makefile, and run

nmake os2

You can just close the emulator as after each run it’ll unpack a hard disk image, so nothing will be lost. or saved. It’s just for testing. You may need to periodically clean the floppy drive, as that is the only way to transfer stuff in and out of the VM.

What versions of CL386 have I found? Well, it’s quite a few, although I know I’m missing quite a few.

== c386 ============================
Microsoft C 5 386 Compiler

Microsoft C 5.2 286/386 Compiler -- Driver

@(#)C Compiler Apr 19 1990 11:48:30

Copyright (c) Microsoft Corp
1984-1989. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 1.00.075
Quick C Compiler Version 2.00.000
1.00.075

== ddk12 ============================

C 6.00 (Alpha) Aug 24 1990 19:12:31

Copyright (c) Microsoft Corp
1984-1989. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.054
Quick C Compiler Version 2.00.000
6.00.054

== ddk20 ============================

C 6.00 (Alpha) Aug 16 1990 23:04:06

Copyright (c) Microsoft Corp
1984-1989. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.054
Quick C Compiler Version 2.00.000
6.00.054

== ddksrc ============================

C 6.00 (Alpha) Aug 24 1990 19:21:49

Copyright (c) Microsoft Corp
1984-1989. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.054
Quick C Compiler Version 2.00.000
6.00.054

== nt-sep ============================

@(#)C Compiler 6.00 Feb 06 1991 17:15:19
@(#)C Compiler 6.00 May 13 1991 23:54:12
@(#)C Compiler 6.00 Jun 03 1991 15:16:22


Copyright (c) Microsoft Corp
1984-1991. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.077
Quick C Compiler Version 2.00.000
6.00.077

== nt-oct ============================

@(#)C Compiler 6.00 Jun 03 1991 15:16:22
@(#)C Compiler 6.00 Jun 13 1991 22:07:23
@(#)C Compiler 6.00 Oct 10 1991 00:42:24

Copyright (c) Microsoft Corp
1984-1991. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.080
Quick C Compiler Version 2.00.000
6.00.080

== nt-dec ============================
@(#)C Compiler 6.00 Jun 03 1991 15:16:22
@(#)C Compiler 6.00 Jun 13 1991 22:07:23
@(#)C Compiler 6.00 Oct 10 1991 00:42:24

Copyright (c) Microsoft Corp
1984-1991. All rights reserved.
  (press <return> to continue)
Microsoft 386 C Compiler. Version 6.00.081
Quick C Compiler Version 2.00.000
6.00.081

== 73g  ============================
1984-1993. All rights reserved.
Copyright (c) Microsoft Corp
8.00.3200
32-bit C/C++ Optimizing Compiler Version
Microsoft (R)


== msvc32s ============================

Microsoft 8.00.0000 - Copyright (C) 1986-1993 Microsoft Corp.
Microsoft 8.00.0000 - Copyright (C) 1986-1993 Microsoft Corp.
@(#) Microsoft C/C++ 32 bits x86 Compiler Version 8.00.XXXX

8.00.000

== 13.10.6030 ============================

Microsoft (R) C/C++ Compiler Version 13.10.6030
From my install of Visual Studio 2003 Enterprise

As you can see many of these earlier OS/2 compilers report the same versions but are in fact different builds on the inside. I suspect Microsoft had to support one version, and an Alpha version of version 6 is as good as it got. I would have imagined there were internal 32bit versions of 6 or 7, but I haven’t seen them.

Compiling and running TREK

Hopefully this gives some idea of how I tried to made a probably too modular build system to try all kinds of different compilers. I might have to see if it’s possible to run the tools from the 1992 versions of Windows NT in this setup, perhaps they are interesting as well.

One thing in my porting GCC to OS/2 experience is that the usability of the C compilers from 1991 were dramatically better than what Microsoft had given IBM at the time of the divorce. No doubt the upcoming NTOS/2 project was placing a bigger demand on the tools team.

If anyone has any access to other ‘cl386’ compilers, or early OS/2 2.00 stuff, please let me know! I’d love to do build/tests and see if my idea of distributing objects ‘just works’!

Totally unfair comparison of Microsoft C

Because I hate myself, I tried to get the Microsoft OS/2 Beta 2 SDK’s C compiler building simple stuff for text mode NT. Because, why not?!

Since the object files won’t link, we have to go in with assembly. And that of course doesn’t directly assemble, but it just needs a little hand holding:

Microsoft (R) Program Maintenance Utility   Version 1.40
Copyright (c) Microsoft Corp 1988-93. All rights reserved.

        cl386 /Ih /Ox /Zi /c /Fadhyrst.a dhyrst.c
Microsoft (R) Microsoft 386 C Compiler. Version 1.00.075
Copyright (c) Microsoft Corp 1984-1989. All rights reserved.

dhyrst.c
        wsl sed -e 's/FLAT://g' dhyrst.a > dhyrst.a1
        wsl sed -e "s/DQ\t[0-9a-f]*r/&XMMMMMMX/g" dhyrst.a1  | wsl sed -e "s/rXMMMMMMX/H/g" > dhyrst.asm
        ml /c dhyrst.asm
Microsoft (R) Macro Assembler Version 6.11
Copyright (C) Microsoft Corp 1981-1993.  All rights reserved.

 Assembling: dhyrst.asm
        del dhyrst.a dhyrst.a1 dhyrst.asm
        link -debug:full -out:dhyrst.exe dhyrst.obj libc.lib
Microsoft (R) 32-Bit Executable Linker Version 1.00
Copyright (C) Microsoft Corp 1992-93. All rights reserved.

I use sed to remove the FLAT: directives which makes everything upset. Also there is some weird confusion on how to pad float constants and encode them.

CONST   SEGMENT  DWORD USE32 PUBLIC 'CONST'
$T20001         DQ      0040f51800r    ;        86400.00000000000
CONST      ENDS

MASM 6.11 is very update with this. I just padded it with more zeros, but it just hung. I suspect DQ isn’t the right size? I’m not 386 MASM junkie. I’m at least getting the assembler to shut-up but it doesn’t work right. I’ll have to look more into it.

Xenix 386 also includes an earlier version of Microsoft C / 386, and it formats the float like this:

CONST   SEGMENT  DWORD USE32 PUBLIC 'CONST'
$T20000         DQ      0040f51800H    ;        86400.00000000000
CONST      ENDS

So I had thought maybe if I replace the ‘r’ with a ‘H’ that might be enough? The only annoying thing about the Xenix compiler is that it was K&R so I spent a few minutes porting phoon to K&R, dumped the assembly and came up with this sed string to find the pattern, mark it, and replace it (Im not that good at this stuff)

wsl sed -e "s/DQ\t[0-9a-f]r/&XMMMMMMX/g" $.a1 \
| wsl sed -e "s/rXMMMMMMX/H/g" > $*.asm

While it compiles with no issues, and runs, it just hangs. I tried the transplanted Xenix assembly and it just hangs as well. Clearly there is something to do with how to use floats.

I then looked at whetstone, and after building it noticed this is the output compiling with Visual C++ 8.0

      0       0       0  1.0000e+000 -1.0000e+000 -1.0000e+000 -1.0000e+000
  12000   14000   12000 -1.3190e-001 -1.8218e-001 -4.3145e-001 -4.8173e-001
  14000   12000   12000  2.2103e-002 -2.7271e-002 -3.7914e-002 -8.7290e-002
 345000       1       1  1.0000e+000 -1.0000e+000 -1.0000e+000 -1.0000e+000
 210000       1       2  6.0000e+000  6.0000e+000 -3.7914e-002 -8.7290e-002
  32000       1       2  5.0000e-001  5.0000e-001  5.0000e-001  5.0000e-001
 899000       1       2  1.0000e+000  1.0000e+000  9.9994e-001  9.9994e-001
 616000       1       2  3.0000e+000  2.0000e+000  3.0000e+000 -8.7290e-002
      0       2       3  1.0000e+000 -1.0000e+000 -1.0000e+000 -1.0000e+000
  93000       2       3  7.5000e-001  7.5000e-001  7.5000e-001  7.5000e-001

However this is the output from C/386:

      0       0       0  5.2998e-315  1.5910e-314  1.5910e-314  1.5910e-314
  12000   14000   12000  0.0000e+000  0.0000e+000  0.0000e+000  0.0000e+000
  14000   12000   12000  0.0000e+000  0.0000e+000  0.0000e+000  0.0000e+000
 345000       1       1  5.2998e-315  1.5910e-314  1.5910e-314  1.5910e-314
 210000       1       2  6.0000e+000  6.0000e+000  0.0000e+000  0.0000e+000
  32000       1       2  5.2946e-315  5.2946e-315  5.2946e-315  5.2946e-315
 899000       1       2  5.2998e-315  5.2998e-315  0.0000e+000  0.0000e+000
 616000       1       2  5.3076e-315  5.3050e-315  5.3076e-315  0.0000e+000
      0       2       3  5.2998e-315  1.5910e-314  1.5910e-314  1.5910e-314
  93000       2       3  5.2972e-315  5.2972e-315  5.2972e-315  5.2972e-315

Great they look nothing alike. So something it totally broken. I guess the real question is, does it even work on OS/2?

Since I should post the NMAKE Makefile so I can remember how it can do custom steps so I can edit the intermediary files. Isn’t C fun?!

INC = /Ih
OPT = /Ox
DEBUG = /Zi
CC = cl386

OBJ = dhyrst.obj

.c.obj:
	$(CC) $(INC) $(OPT) $(DEBUG) /c /Fa$*.a $*.c
	wsl sed -e 's/FLAT://g' $*.a > $*.a1
	wsl sed -e "s/DQ\t[0-9a-f]*r/&XMMMMMMX/g" $*.a1 \
	| wsl sed -e "s/rXMMMMMMX/H/g" > $*.asm
	ml /c $*.asm
	del $*.a $*.a1 $*.asm

dhyrst.exe: $(OBJ)
        link -debug:full -out:dhyrst.exe $(OBJ) libc.lib

clean:
        del $(OBJ)
        del dhyrst.exe
        del *.asm *.a *.a1

As you can see, I’m using /Ox or maximum speed! So how does it compare?

Dhrystone(1.1) time for 180000000 passes = 20
This machine benchmarks at 9000000 dhrystones/second

And for the heck of it, how does Visual C++ 1.0’s performance compare?

Dhrystone(1.1) time for 180000000 passes = 7
This machine benchmarks at 25714285 dhrystones/second

That’s right the 1989 compiler is 35% the speed of the 1993 compiler. wow. Also it turns out that MASM 6.11 actually can (mostly) assemble the output of this ancient compiler. It’s nice when something kind of work. I can also add that the Infocom ’87 interpreter works as well.

YAY!

Patching Touhou 6 (Embodiment of Scarlet Devil) to run on a 3dfx Voodoo 2

This is a guest post from spaztron64

One thing that’s been bugging me for years at this point is the ability to run Touhou 6 on my PC-9821 V166. For a good few years I’ve been stuck with nothing more than a Matrox Mystique graphics card in that thing, which can’t create a D3D6 HAL context for rendering the game’s 3D elements. In 2021 I snatched a 12MB 3dfx Voodoo 2, in hopes of being able to play more 3D games on that machine. There were two major problems….

1) The USBHID.SYS driver for PC-98 Windows 9x conflicts haaaard with Voodoo drivers. Moving the cursor around corrupts memory and makes the system unstable or kills the driver in mere seconds of use

2) None of the Touhou games support secondary Direct3D devices

For those not in the know in regards to the second issue, DirectX allows you to use multiple DDraw and D3D capable GPUs on one system. By default it’ll set the video card outputting a signal on the primary monitor as the primary DirectX device, the secondary output as secondary, and so on. Most people only used one monitor on their Win9x PC back in the day, hooked up to their 2D capable card. The Voodoo 1 and 2 aren’t meant to act as 2D video cards, yet they had to support D3D initialization somehow, so they presented themselves as non-primary DirectX devices, usually secondary, in hopes that game developers would allow the end user to select their 3D accelerator of choice.

This was standard practice at the tail end of the 1990s, but it was falling out of use at the turn of the millenium with the demise of 3dfx and the general lack of need for multiple graphics cards in one system for 3D gaming. This presented a problem, as games that technically could be played on a Voodoo 2… didn’t, as they could never be told to use it through normal means. Hacky solutions existed, like 3dfx’s unfinished, buggy attempt at a Voodoo 2 driver for Windows 2000 that allowed it to behave like a primary display adapter for general 2D and 3D use, but it’s notoriously unstable and isn’t possible to use on 9x. I’ve used this method before to play Touhou 6, Max Payne 1, GZDoom, GTA 3 and Vice City on the Voodoo 2 through Windows XP with mixed results.

Once I got an NEC bus mouse for use on my PC-98, I could finally use the Voodoo 2 on it without constant crashing. This got me interested in trying to get Touhou 6 to work on it, which lead me to a path of pure pain.

For starters, Touhou 6 is one of those games that only use primary DirectX devices, like the unsupported Mystique, so I had to somehow coax it into initializing the secondary device instead. My first approach to handling this was through direct binary patching. I didn’t know where to look for the init routines, so I asked 32th System for some heads up, and he pointed me to a rough location in process memory where the appropriate CreateDevice calls reside:

I then searched for the appropriate opcodes in the game binary, and patched all 6A 00 (push 0, A.K.A D3DADAPTER_DEFAULT) opcodes to be 6A 01 (push 1), forcing the game to init the secondary D3D device.

While this initially did in fact work, the approach ultimately sucked for two reasons.

1) Static binary patching only works for that specific binary, and doesn’t carry across different versions.

2) This requires manually patching every CreateDevice call, of which there are many in Touhou 6

It is at this point that I started sharing my progress with friends. jbit was quick to hop in and say “Why the fuck are you doing it this way? Just make a d3d8.dll wrapper DLL”. This was absolutely the smarter approach, I just didn’t know how to do it since I don’t know jack about DirectX programming. Fortunately, he handed me a little VS project he worked on called d3dcutter that, among other things, wrapped the CreateDevice function, which I promptly modified to always push 1 instead of 0 to the stack for the device selection parameter.

This solved the two patching problems, and I had something to show for starters:

Now, I’m sure you can tell that the performance is absolutely atrocious. This came as no surprise to me, as while the Voodoo can absolutely render the game at a full 60FPS most of the time, the dinky little Pentium MMX 166 struggles hard at doing triangle setup for the backgrounds every 16 milliseconds. Remember, 3dfx cards had no Hardware T&L, so the game has to fall back to Software T&L. I think the following wireframe screenshot will help illustrate the amount of work the CPU has to do every frame:

“Well, I’m in luck!”, one might think as they remember that older Touhou games support framerate division by 1/2 (30FPS) and 1/3 (20FPS) as options in custom.exe… There’s just one problem.

The game just… fails to initialize the D3D HAL on the Voodoo 2 in the frame divided modes. Why? Beats me, I still haven’t figured it out and likely never will. Just more ZUNcode bullshit I suppose.

I then figured out that framerate division is handled by a variable (that can be set even lower than 1/3, by the way) which could be set at runtime, even when the normal 60FPS mode is used. I suspected that the game uses a different initialization path for the two modes, so I once again tracked down opcodes that expect the variable to be set to 0 in process memory, this time with Cheat Engine, and patched them in the binary. Well guess fucking what, the game fails to initialize even when the regular init routines are modified to expect 30FPS or 20FPS frame division to be set.

This approach simply wasn’t going to work. so I went with trying to set the variable at runtime. Unfortunately, I had to go back to version specific patching once more for this, since there’s no way to wrap this functionality through DLL means. Additionally, while this wasn’t hard to do with the game running in windowed mode with Cheat Engine on the side on a modern system, it was basically impossible on a Voodoo 2 equipped machine, as the game ran in fullscreen and it wasn’t possible to restore the window after an Alt+Tab.

My final solution was to generate a trainer in Cheat Engine for version 1.02d of the game, as it’s the last one with a working logic speed limiter, that would forcibly set the frame divider variable at runtime with a hotkey:

This finally allowed me to play the game in 1/3 framerate mode on the Voodoo 2.
This allows the game to run at full logic speed most of the time, as the CPU now has 40 miliseconds per frame for triangle setup, however there’s something wrong with how the card handles buffer swaps in this mode of operation, leading to a very back-and-forth stuttery image that’s very unpleasant to look at.

Can we do better? Well yes! The game uses so-called STD scripts for certain stage-specific data setup, but also handling camera movement and geometry generation. Using Touhou Toolkit, I was able to unpack the appropriate DAT file, decompile all the STD scripts, remove all geometry commands, and recompile them for in-game use. As there are no more backgrounds to draw, there’s a trail effect left behind every frame, but thankfully custom.exe has an option to forcibly clear the back buffer every frame.

The end result? A nearly tripled framerate in 60FPS mode, recovered just by not drawing any 3D backgrounds. The game still lags when lots of bullets are on screen, but this doesn’t really come as a surprise.

Building MS-DOS 2.11

I thought I’d slap together some github thing with MS-DOS 2.11 that’s been made buildable thanks to a whole host of other smart people. The default stuff out there expects you to build it under MS-DOS using the long obsoleted ‘append’ utility which can add directories to a search path. Instead I created a bunch of makefiles that take advantage of MS-DOS Player, and let you build from Windows.

dos211: just the MS-DOS 2.11 sources, I re-aranged stuff and made it (slightly) easier to rebuild on Windows. (github.com)

building should be somewhat straightforward, assuming you have the ms-dos player in your path. JUST MAKE SURE YOU UNZIP as TEXT mode. If you are getting a million errors you probably have them in github’s favourite unix mode.

D:\temp\dos211-main\bios>..\tools\make
msdos ..\tools\masm ibmbio.asm ibmbio.obj NUL NUL
The Microsoft MACRO Assembler , Version 1.25
 Copyright (C) Microsoft Corp 1981,82,83


Warning Severe
Errors  Errors
0       0
msdos ..\tools\masm sysimes.asm sysimes.obj NUL NUL
The Microsoft MACRO Assembler , Version 1.25
 Copyright (C) Microsoft Corp 1981,82,83


Warning Severe
Errors  Errors
0       0
msdos ..\tools\masm sysinit.asm sysinit.obj NUL NUL
The Microsoft MACRO Assembler , Version 1.25
 Copyright (C) Microsoft Corp 1981,82,83

DOSSYM in Pass 2

Warning Severe
Errors  Errors
0       0
msdos ..\tools\LINK IBMBIO+SYSINIT+SYSIMES;

   Microsoft Object Linker V2.00
(C) Copyright 1982 by Microsoft Inc.

Warning: No STACK segment

There was 1 error detected.
msdos ..\tools\exe2bin.exe IBMBIO IBMBIO.COM < 70.TXT
Fix-ups needed - base segment (hex): 70
del -f ibmbio.obj    sysimes.obj   sysinit.obj ibmbio.exe

D:\temp\dos211-main\bios>

As an example building the bios by running make. For the impatiend you can download dos211.zip, which includes a bootable 360kb disk image, and a 32Mb vmdk!

32016 stand alone planetfall!

InfoTaskForce’87 running on a simple NS32016 emulator

What is it?

It sure may not look like much but it was an adventure getting here.

First, what is it? Well it’s the very simple NS32016 from here, with a few minor changes. I expanded the RAM from 256kb to a whopping 8MB. Then I added simple character I/O allowing me to print messages to the screen. Next looking at the toolchain page, I used my old Linux to Windows GCC 4 cross compiler to build the appropriate Canadian cross compiler to the NS3216.

Building the tools

A while back, I had built a cross compiler from Linux to Windows using GCC 4.1 as the basis as it was the last version that didn’t have massive external dependencies. NS32016 support was dropped some time in the late 3.x or early 4.x GCC so it means we need to go old anyways. I arbitrary picked GCC 2.8.1 for this build, while using the recommended Binutils 2.27

I cheated and just downloaded my existing linux-minw32.7z cross compiler as I didn’t feel like rebuilding everything again, although it is all in the Building a MIPS Compiler for Windows on a Linux VM! article. I also used an old Linux to Linux i586 32bit compiler (back from the OSKit build!) although you can use your hosts as well.

configuring Binutils is pretty simple like this:

./configure --prefix=/cross --target=ns32k-pc532-netbsd --host=i686-mingw32 --build=i586-linux

You can try omitting the –build portion, Debian GNU Linux 10 seemed okay with Gcc 8 as the default system compiler.

configuring GCC 2.8.1 was pretty similar:

./configure --target=ns32k-pc532-netbsd --prefix=/cross --disable-libssp --build=i586-linux --host=i686-mingw32

GCC 2.8.1 doesn’t quite know what we are doing so there is some flags we need to run off in auto-config.h namely

  • #define HAVE_BCMP 1
  • #define HAVE_BCOPY 1
  • #define HAVE_BZERO 1
  • #define HAVE_INDEX 1
  • #define HAVE_KILL 1
  • #define HAVE_RINDEX 1
  • #define HAVE_SYS_RESOURCE_H 1
  • #define HAVE_SYS_TIMES_H 1

You can just comment them out, or remove those lines all together.

When it came to building GCC, I did run into issues with GCC 7/8 trying to build GCC 2.8.1. I found it much easier to either have that Linux 4.1 compiler, or if you have access to Wine or WSL you can just run the Win32 binaries for the gen phases.

./configure --prefix=/cross --target=ns32k-pc532-netbsd --host=i686-mingw32
make CC=i686-mingw32-gcc xgcc cccp cc1 cc1obj

If you can run your own Win32 exe’s on Linux it’ll run just fine using the Linux to Windows GCC 4 cross. Otherwise you will need to either patch GCC or make your own GCC 4 hosted Linux to Linux cross compiler like this:

make CC=i686-mingw32-gcc HOST_CC=i586-linux-gcc xgcc cccp cc1 cc1obj

Hopefully that worked enough, and now you have your cross compiler. Now it’s time to build libgcc1.a

cp cccp cpp.exe
cp cc1 cc1.exe
cp xgcc xgcc.exe
cp ../binutils-2.27/gas/.libs/as-new.exe as.exe
cp ../binutils-2.27/binutils/.libs/ar.exe ar
cp ../binutils-2.27/binutils/.libs/ranlib.exe ranlib
make libgcc1.a TARGET_TOOLPREFIX="./" OLDCC=./xgcc.exe

Again you really want to be able to run the resulting programs on Linux but I guess you could script it out. Naturally if you wanted to just use Linux, it’d be easier to make that cross compiler directly, although I’m not sure how much of GCC 2.8.1 I want to fight, or just get GCC 4 running on Linux and use that to port.

crt0, somewhere for C to start

As mentioned a crt0.s is missing but there was enough inspiration to come up with this:

#NO_APP
gcc_compiled.:
.text
        .align 1
.globl _start
_start:
        enter [],0
#APP
#       setting the stack 256k under 8MB
        lprd sp,0x7c0000
        jsr _main
#NO_APP
L1:
        exit []
#       setting the stack 256k under 8MB
        lprd sp,0x7c0000
        bpt
        .align 1

#does nothing
.globl ___main
___main:
        ret 0

.globl _exit
_exit:
        bpt
        ret 0

I used a bit of the C example, and added some hooks that GCC was expecting namely a __main call that is made from main before it does anything (a place to init memory perhaps?), a place to catch an explicit exit call, along with setting the stack of course.

Patching InfoTaskForce without malloc / disk access

It’s not going to win any awards, but it was really great to get it to run a simple program written in GCC. Looking for something more fun, I took the old InfoTaskForce interpreter from ’87, and dug up my modification to run on cisco routers, and cooked up this version, that adds enough of printf from Linux, a bogus malloc that just allocates from a fixed memory array (otherwise you have to actually know about your platform), and a fun trick with later binutils where you can import a binary file directly as an object!

Neat!

Since I don’t have any file I/O being able to have the game data in RAM is crucial. I tried to tweak it so you could build the same working thing on Windows (maybe others?).

So for anyone who wants to look at the standalone adventure Win32 hosted tools are here, although the emulator should be somewhat portable.

Quick and dirty i8080 emulator in BASH!

Based on the Toledo 8080 processor emulator, it’s Peter Naszvadi’s i8080.sh!

You will need bash version 4 for this to run. So for those of you in the 1990’s you are out of luck.

And to make this fun, the 2kb basic ‘ROM’ is bootstrapped into RAM for immediate execution!

$ ./8080.sh
Quick and dirty i8080 emulator by NASZVADI Peter, 2019.
Press Ctrl+C to quit anytime!
Initializing upper memory to NOPs
Initializing register values
Starting CPU emulation


OK
>10 PRINT "HIHI"
>RUN
HIHI

OK
>

So we’ve seen emulation in javascript of all things, but now you can run 8080 instructions from bash of all things!

And for anyone even more crazy there is always BDS C: which was opened up back in 2002, a K&R for the 8080 on CP/M!

Messing with the Monitor

The 68000 Microprocessor (5th Edition) Hardcover Nov 25 2003
by James L. Antonakos (Author)

So I was trapped in the Library for a bit, and spied this book. It’s not every often in 2019 you are going to find books about the 68000, as I’m sure any good library will have removed stuff like this, and have it pulped ages ago. But the amount of current technical books in English here is pretty damned slim to none, so I was all to happy to pickup this book for a week.

The poor thing has been checked out 4 times in the last 15 years. I guess the kids don’t know what they are missing.

Anyways what was interesting in this book is that it has a CD-ROM, and on there is some lesson code from the book, along with an assembler that outputs to S-records of all things, and a small emulator that is meant to be compiled under MS-DOS. It was trivial to isolate the passing of DOS interrupts from Unix/MinGW and get the simulator running on something modern.

In Chapter 11 there is a brief walkthrough on building a board, which sounds like fun although I’m sure in 2019 finding parts will be.. challenging, along with a simple monitor program.

The built in assembler can happily assemble the monitor, but it’s geared for talking to the obsolete hardware as specified in the book. I just made a few small changes to instead have it’s console IO hook to the simulator’s TRAPs and I had the monitor running!

I then took the echo test program, and modified it to run at a higher location in memory, along with exiting via the RTS instruction, so that it will exit when you press Q back to the monitor. Then for the heck of it I further extended the monitor so you can Quit it, and return to the simulator.

Is this useful? I’m pretty sure the answer is absolutely not.

The CD-ROM is tiny, I thought it would be packed with goodies, but it’s 250kb compressed.

The68000MicroprocessorFifthEdition.zip

If anything people using this book will probably have lost the CD-ROM and want the programs.

  • ISBN-10: 0130195618

And my horrible changes here.