Elijah Miller’s NEC v30 on a Pi hat

v30 on a board

While talking about home brew 8080 and 8086 systems on Discord an ebay search brought me to Elijah’s store page where this small little curiosity was up for sale. It’s literally just a NEC v30 on a Raspberry Pi hat, for a mere $15 USD! Interestingly enough the v30 can operate at 3.3v meaning no special hardware is required to interface to the GPIO bus on a Pi. This reminds me so much of the CP/M cartridge for the Commodore 64, and the price being so right I quickly ordered one and eagerly awaited to 2 weeks shipping to Asia.

While I have Pi 4’s that I run Windows 10 on to drive some displays & power point, I wanted to use the slightly faster Pi400 for this. The Pi400 has a compatible GPIO expansion port so just like a cartridge it’s a simple matter of slotting the card, powering up and building the software. While there is an included binary, it’s a 32bit one, and I’m running Manjaro on the Pi400 for a similar look/feel as the PineBook Pro. Anyways the dependences are SDL2, and an odly named ‘wiringPi’ library that allows C programs to interface to the GPIO.

You can download the emulator over on homebrew8088, specifically the Raspberry Pi Second Project. The last ‘ver 2’ download has the project configured for a v30 which is an 8086 analogue, unlike the v20 which is an 8088. When physically interfacing to the processor things like this really matter!

With the emulator built it was pretty simple to fire it up, and boot into MS-DOS:

first boot!

I have to admit I was a little startled at first as I really had no idea if this was going to work at all. I’d spoken to an engineer friend and he was saying plugging a CPU directly into the GPIO bus, and toggling connections to actually emulate the board was both crazy and that without any electrical buffers it’d most likely either fry the processor and maybe the Pi as well. I suspect this being low voltage may be sparing both, although I have no EE so I’m not going to pretend to know.

Loading up Norton SI confirms what Elijah had posted on Ebay is that it runs very slowly about 1/3rd the speed of an XT. Now I may not know anything about hardware but this seemed at least something a profiler could at least tell me what is going on, and if someone like me helicoptering in on the shoulder of giants could see something.

gcc -I/usr/include/SDL2 -pg -O2 *.cpp -o pi -lSDL2 -lwiringPi -lpthread -lstdc++

This will build a profiled version of the emulator that’ll let us know which functions are being called both the number of times, and how much time to do so. Not knowing anything but having profiled other emulators, the usual pattern is that you spend most time fetching and possibly translating memory; Both in feeding instructions and pushing/popping data from stack and pointers. Waiting is usually for initialisation and for IO.

Once you’ve run your profiled executable, it’ll dump a binary file gmon.out which you can then use gprof to format to a text file like this:

gprof pi gmon.out > report.txt

And then looking at the report you can see where the top time, along with top calls are. Some things just take a while to complete and other well they get called far too often.

Each sample counts as 0.01 seconds. % cumulative self self total
time seconds seconds calls s/call s/call name
39.91 0.71 0.71 286883 0.00 0.00 Print_Char_9x16(SDL_Render er*, int, int, unsigned char)
16.30 1.00 0.29 1 0.29 1.02 Start_System_Bus(int)
12.37 1.22 0.22 1100374 0.00 0.00 Data_Bus_Direction_8086_OUT()
7.87 1.36 0.14 5954106 0.00 0.00 CLK()

As expected Start_System_Bus takes 1 second, followed by 1,100,374 calls to set the Data_Bus_Direction_8086_OUT (no doubt the Pi needs to alternate between reading and writing to the CPU), followed by 5,954,106 ticks of the CLK function. Of course the real culprit is Print_Char_9x16 which was called 286,883 times, and is responsible for nearly 40% of the tuntime!

Obviously for a simple MS-DOS boot the screen should not be calling any print char anywhere near this many times. Clearly something is amiss. Not knowing anything I added a simple counter to block at the top of the Print_Char_9x16 function to let it only execute 1:1000 times, and I got this:

Obviously it’s not right, which means that the culprit really isn’t Print_Char_9x16 but rather what is calling it. It was a simple change to each of the Mode functions to only render a fraction of the time, and I changed it to a define to let me fire it more often. This is a simple diff, assuming WordPress doesn’t screw it up. It’s not pretty but it gets the job done.

$ diff -ruN ver2/vga.cpp ver2-j/vga.cpp 
--- ver2/vga.cpp	2020-07-29 10:36:51.000000000 +0800
+++ ver2-j/vga.cpp	2021-06-04 01:51:33.546124473 +0800
@@ -1,5 +1,9 @@
 #include "vga.h"
 
+static int do9x16 = 0;
+#define VIDU 5000
+
+
 void Print_Char_18x16(SDL_Renderer *Renderer, int x, int y, unsigned char Ascii_value)
 {
 	for (int i = 0; i < 9; i++)
@@ -23,6 +27,12 @@
 
 void Mode_0_40x25(SDL_Renderer *Renderer, char* Video_Memory, char* Cursor_Position)
 {
+do9x16++;
+if(do9x16>VIDU)
+        {do9x16=0;}
+else
+        {return;}
+
 	int index = 0; 
 	for (int j = 0; j < 25; j++)
 	{
@@ -36,6 +46,7 @@
 	Print_Char_18x16(Renderer, (Cursor_Position[0] * 18), (Cursor_Position[1] * 16), 0xDB);
 	SDL_RenderPresent(Renderer);	
 }
+
 void Print_Char_9x16(SDL_Renderer *Renderer, int x, int y, unsigned char Ascii_value)
 {
 	for (int i = 0; i < 9; i++)
@@ -57,6 +68,12 @@
 }
 void Mode_2_80x25(SDL_Renderer *Renderer, char* Video_Memory, char* Cursor_Position)
 {
+do9x16++;
+if(do9x16>VIDU)
+        {do9x16=0;}
+else
+        {return;}
+
 	int index = 0; 
 	for (int j = 0; j < 25; j++)
 	{
@@ -102,6 +119,12 @@
 
 void Graphics_Mode_320_200_Palette_0(SDL_Renderer *Renderer, char* Video_Memory)
 {
+do9x16++;
+if(do9x16>VIDU)
+        {do9x16=0;}
+else
+        {return;}
+
 	SDL_RenderClear(Renderer);
 			int index = 0; 				
 			for (int j = 0; j < 100; j++)
@@ -156,6 +179,12 @@
 }
 void Graphics_Mode_320_200_Palette_1(SDL_Renderer *Renderer, char* Video_Memory)
 {
+do9x16++;
+if(do9x16>VIDU)
+        {do9x16=0;}
+else
+        {return;}
+
 	SDL_RenderClear(Renderer);
 			int index = 0; 
 			for (int j = 0; j < 100; j++)

While it feels more responsive on the console, it’s still incredibly slow. SI was returning the same speed which means that although we aren’t hitting the screen anywhere near as often it’s still doing far too much. Is it really a GPIO bus limitation? Again I have no idea. But the next function of course is the clock.

First I tried dividing the usleep in half thinking that maybe it’s not getting called enough. And running SI revealed that I’d gone from a 0.3 to a 0.1! Obviously this is not the desired effect! So instead of a divide I multiplied it by four:

diff -ruN ver2/timer.cpp ver2-j/timer.cpp 
--- ver2/timer.cpp	2020-08-12 00:32:13.000000000 +0800
+++ ver2-j/timer.cpp	2021-06-04 02:06:25.505904407 +0800
@@ -7,7 +7,7 @@
 {
    while(Stop_Flag != true)
    {
-      usleep(54926); 
+      usleep(54926*4); 
       IRQ0();
    }
 }

Now re-running SI I get this:

Norton SI with clock multiplied by four

Now it’s scoring a 1.5! Obviously these are all ‘magic numbers’ and tied to the Pi400 and more importantly I haven’t studied the code at all, I’m not trying to disparage or anything, if anything it’s just a quick example why profiling your code can be so important! At the same time trying to run games is so incredibly slow I don’t even know if my changes had any actual impact to speed as emulation of benchmarks can be such a finickie thing.

My goto game, Battletech 3025 Crescent Hawks Inception loads to the first splash but then seems to hang. I could be impatient or there could be further issues but I’m just some impatient tourist with a C compiler…

With my changes and re-running the profiler I now see this:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 95.41    129.23   129.23 22696621     5.69     5.69  Read_Memory_Array(unsigned long long, char*, int)
  2.90    133.15     3.92                             Start_System_Bus(int)
  0.88    134.34     1.19 64369074     0.02     0.02  CLK()
  0.30    134.74     0.40                             keyboard()
  0.16    134.96     0.22   412873     0.53     0.53  Print_Char_9x16(SDL_Render
er*, int, int, unsigned char)
  0.08    135.07     0.11 11273939     0.01     0.01  Data_Bus_Direction_8086_OUT()

Which is now what I expect with the bulk of the emulation now calling Read_Memory, with the Clock following that and of course our tamed screen renderer (although its still called far too much!) with the Data_Bus_Direction being further down the list. No doubt some double buffering and checking what changed in between calls would go a LONG way to optimise it, just as would actually studying the source code.

The one cool thing about this is that if I wanted to write a PC emulator this way gives me the confidence that the CPU is not only 100% cycle accurate, but it’s 100% bug for bug accurate since we are using a physical processor.

And again for $15 USD + Shipping I cannot recommend this enough!

Virtualization Challenge IV – QNX 1.2

(This is a guest post by Antoni Sawicki aka Tenox)

This is a Virtualization Challenge. A competition to virtualize an OS inside emulator/hypervisor. (Previously 1 / 2 / 3)

This time the object of the competition is QNX version 1.2. A demo disk is covered here. This is the set of floppy disks:

As you can see the boot disk is copy-protected. As such I have imaged these disks using both KryoFlux and SuperCard Pro. The magnetic flux stream images are available here. For verification I have converted the raw stream of the demo disk in to a sector image using HFE tool. The converted disk boots and works correctly in an emulator. The demo disk can also help with analyzing the boot process since it’s known to work.

The contest is to virtualize the OS, install it and provide a fully working hard disk image with the OS installed. Any emulator of your choice or method is acceptable as long as anyone can download and run it. The prize is $100 via PayPal and of course the fame! 🙂 The winner will be whoever comments the article first with a verifiable working solution.

A bonus $50 prize will be awarded if you can patch the boot floppy disk so that it can be installed as if the copy protection was never there.

Good luck!!!

UPDATE: The competition has ben won: QNX 1.2 Virtualized

UPDATE 2 : QNX 1.2 challenge Act II – HDD Boot

UPDATE 3: Reverse-engineering QNX 1.2 to boot from HDD

QNX 1.1 Demo Disk

(This is a guest post by Antoni Sawicki aka Tenox)

Fresh from the oven, or rather Kryoflux dump – a QNX version 1.1 Demo Disk:

QNX 1.1 Demo Disk

I managed to boot it on 86Box:

QNX 1.1 booted on 86Box Emulator

For the readers with more curiosity and time at their hands please could you try it on different emulators and comment what works and what doesn’t.

For the less curious this how the demo actually looks like once you log in as demo user:

QNX 1.1 Demo Menu

As the authors demand to make as many copies of this disk as possible here it is. Please download and spread!

I also managed to dump the rest of QNX 1.2 including boot disk, utils and even c compiler. Unfortunately the boot disk is copy protected:

I have raw stream dump made with Kryoflux as well as regular disk images. If you are interested in circumventing checking the copy protection so the system could be run in an emulator let me know in a comment. Perhaps time for another Virtualization Challenge?

Previously:

Virtualizing QNX 2

QNX Windows – First Look

QNX 2.21 Arrived Today

PCem v15 released!

The new dynamic recompiler appears to be much more faster, although if you want maximum performance, make sure to set your video card to the fastest possible performance.

I was doing my typical DooM thing, and the performance was abysmal. But I did have an 8bit VGA card selected, so what would I really expect? Interestingly enough in ‘low resolution’ mode it performed quite well, but setting it to the artificial ‘fastest PCI/VLB’ speed it was performing just great.

PCem v15 released. Changes from v14 :

  • New machines added – Zenith Data SupersPort, Bull Micral 45, Tulip AT Compact, Amstrad PPC512/640, Packard Bell PB410A, ASUS P/I-P55TVP4, ASUS P/I-P55T2P4, Epox P55-VA, FIC VA-503+
  • New graphics cards added – Image Manager 1024, Sigma Designs Color 400, Trigem Korean VGA
  • Added emulation of AMD K6 family and IDT Winchip 2
  • New CPU recompiler. This provides several optimisations, and the new design allows for greater portability and more scope for optimisation in the future
  • Experimental ARM and ARM64 host support
  • Read-only cassette emulation for IBM PC and PCjr
  • Numerous bug fixes

Thanks to dns2kv2, Greatpsycho, Greg V, John Elliott, Koutakun, leilei, Martin_Riarte, rene, Tale and Tux for contributions towards this release.

As always PCem can be downloaded here:

Booting a PC over serial port via ROM Basic!

Galaxian via Basic over the serial port

I was sent this link while out for vacation: https://github.com/retrohun/blog/tree/master/dt/bootingfromcom1

So this is great for machines that included the seemingly useless ‘casette basic’ as you could maybe shove over something to config the machine, maybe ‘rom dos’ directly into ram to fdisk/format without using disks… Interesting stuff to say the least!

As requested, PCem v11 with networking

via SLiRP

via SLiRP

injecting networking was no more difficult than it was in version 10.  It’s only a few changes to pc.c, if you look at the USENETWORKING define you’ll see them.  The best notes are on the forum.

I haven’t changed or improved anything it still requires manual configuration.

Downloads are available on my site as pcem_v11_networking.7z.  You’ll have to defeat the password protection, as always.  I included the source, it ought to be trivial to rebuild.

*For anyone using an old version the ‘nvr’ directory is missing, so PC-em is unable to create new non volatile ram save files, meaning you always loose your BIOS settings.  Sorry I missed that one.

PCem v11 released

I haven’t had time to follow it, but great news!

PCem v11 released. Changes from v10.1 :

  • New machines added – Tandy 1000HX, Tandy 1000SL/2, Award 286 clone, IBM PS/1 model 2121
  • New graphics card – Hercules InColor
  • 3DFX recompiler – 2-4x speedup over previous emulation
  • Added Cyrix 6×86 emulation
  • Some optimisations to dynamic recompiler – typically around 10-15% improvement over v10, more when MMX used
  • Fixed broken 8088/8086 timing
  • Fixes to Mach64 and ViRGE 2D blitters
  • XT machines can now have less than 640kb RAM
  • Added IBM PS/1 audio card emulation
  • Added Adlib Gold surround module emulation
  • Fixes to PCjr/Tandy PSG emulation
  • GUS now in stereo
  • Numerous FDC changes – more drive types, FIFO emulation, better support of XDF images, better FDI support
  • CD-ROM changes – CD-ROM IDE channel now configurable, improved disc change handling, better volume control support
  • Now directly supports .ISO format for CD-ROM emulation
  • Fixed crash when using Direct3D output on Intel HD graphics
  • Various other fixes

Thanks to Battler, SA1988, leilei, Greatpsycho, John Elliott, RichardG867, ecksemmess and cooprocks123e for contributions towards this release.

Downloads are available for Windows & Linux.

Porting Quake II to MS-DOS pt4

Bringing it all home for release day.

Bringing it all home for release day.

Since the last update we got some help in a few fields that have really fleshed out this ‘experimental’ port into a full fledged port.  First RayerR helped us with the fun of getting us onto the latest deployment of DJGPP, 2.05 (rc1).  It’s always nice to be in the latest available release.  Next in a passing comment, Ruslan Starodubov had mentioned that he had gotten a much older build of our QDOS to support the Intel HDA sound chipset via the  MPXPlay sound library.  I wrote to the author of MPXPlay, Pádár Attila asking for us to distribute his source in our project, and he granted permission.

So at this point things were looking good.  The only ‘feature’ that modern OS’s really held over us was the ability to dynamically load and unload game modules on the fly.  I had tried to use DLM, but it stripped the DPMI functionality out of the MS-DOS Extender making the port really useless.  I tried to build the newer DXE3 support but had no luck.  I suspect now my native tool chain was interfering with the build process. But Maraakate managed to get it to not only build, but to run!

Adding in DX3 support was relatively painless.  I first looked at DJGPP’s FAQ and downloaded the example code.  In the example code there was small helper functions to make unions and check the symbols.  If they didn’t exist a printf was spit out to alert you about it.  To resolve the issue you simply just add DXE_EXPORT to the other list of missing exports.

Compiling the game code was easy, again referring to the example I saw that basically they compiled it the same, but at link time you use DX3GEN and -U flag to ignore unresolved symbols.

The biggest head scratcher was the Sys_GetGameAPI failing to find GetGameAPI from the DX3.  After some piddling around I noticed that it listed GetGameAPI as _GetGameAPI inside the DX3 itself.  I added the underscore and it worked!

Other things that were relatively to easy to import was R1Q2’s HTTP downloading code.  Compiling CURL was kind of tricky because of the linking order, but thankfully neozeed figured it out quickly.

All of Yamagi’s Quake 2 updated game DLLs were all diff’d by hand using BeyondCompare to make sure I didn’t clash using some newer functions that weren’t available in DJGPP.  I also merged their Zaero code with their baseq2 code by comparing Zaeros code to the Quake 2 SDK, marking every thing that was changed.  The result is a really stable Zaero game code.  If you haven’t played Zaero check it out.  I think it’s a lot better than Rogue, but Xatrix is probably my favourite (even over stock Q2).

Other cool things I’m glad to get into the code was the GameSpy Browser.  It took me quite a bit of work to get it where it is, but it’s really nice to just be able to ping to a master server (a custom GameSpy emulator I wrote specifically for Q2DOS.  Source is not finalized yet, but will be available soon for those curious), pick a server and go!  All in DOS!

So here we are at the end of the journey.  Or at least safe enough for a 1.0 release.

To recap, we have:

* VGA
* SVGA (LFB modes only)
* Mouse
* Keyboard
* SoundBlaster and Gravis UltraSound Family
* CD-ROM music
* OGG music
* Networking (You need a packet driver)
* Loading/unloading game DLLs in DX3 format.
* Intel HDA support -hda
* Mouse wheel support with -mwheel

And I should add, that it works GREAT on my MSI Z87 motherboard.

You can download Quake II for MS-DOS on bitbucket.  And as always the source is available here.

Don’t forget you can always make bootable USB stick with DOS, or CD-ROMs.

Continued in Part 5!

PCem

PCem v9

PCem v9

From the main page:

PCem v9 released. Changes from v8.1 :

  • New machines – IBM PCjr
  • New graphics cards – Diamond Stealth 3D 2000 (S3 ViRGE/325), S3 ViRGE/DX
  • New sound cards – Innovation SSI-2001 (using ReSID-FP)
  • CPU fixes – Windows NT now works, OS/2 2.0+ works better
  • Fixed issue with port 3DA when in blanking, DOS 6.2/V now works
  • Re-written PIT emulation
  • IRQs 8-15 now handled correctly, Civilization no longer hangs
  • Fixed vertical axis on Amstrad mouse
  • Serial fixes – fixes mouse issues on Win 3.x and OS/2
  • New Windows keyboard code – should work better with international keyboards
  • Changes to keyboard emulation – should fix stuck keys
  • Some CD-ROM fixes
  • Joystick emulation
  • Preliminary Linux port

Thanks to HalfMinute, SA1988 and Battler for contributions towards this release.

Very excellent!

8086tiny de-obfuscated!

I came across this site, which is from the author of the winning IOCCC entry, 8086tiny!

It’s ballooned from just under 4kb to 28kb, but still incredibly tiny!

For anyone interested it’s features:

  • Highly accurate, complete 8086 CPU emulation (including undocumented features and behavior)
  • Support for all standard PC peripherals: keyboard, 3.5″ floppy drive, hard drive, video (Hercules/CGA graphics and text mode, including direct video memory access), real-time clock, timers, PC speaker
  • Disk images are compatible with standard Windows/Mac/UNIX mount tools for simple interoperability
  • Complete source code provided (including custom BIOS)

It’s worth checking out for some old time PC/XT nostalgia!