Modernising the National Geographic CD-ROM collection

The fancy collector’s box set

I wanted to get some research into some early space flight, and look into that magical transition when everything went from Cowboys & Indians to moonmen & the race to space. Granted Toy Story covers cultural touchstone pretty well, reading period pieces is fun too. I had a few CD-ROM’s containing the 1980’s National Geographic CD-ROM’s when they were sold up in decade sets, but what always escaped me was the fancy collectors box with the whole thing.

$169!!

I always figured this was going to be one of those weird collectors’ items that probably was under produced, over sold, and lost to the winds of time. Looking on eBay for a 1950’s and 1960’s set .. and thinking about the 1970’s as well, and it was going to get close to £40. Ouch. So for the heck of it, I look for the fancy box set.

wow

I was surprised for as much as I was going to end up paying for one or two sets, I could get the entire thing. In the legendary fancy wooden box. I got mine, shipped for under £20. Much wow!

Okay what is the catch?

Wait a minute, these National Geographic CDs are obsolete!Christopher Elliott

I always was running mine on MacOS using Cockatrice III with the monitor resolution set to the absolutely absurd resolution of 1152×870 provided by the 1991 21″ monitor. I couldn’t imagine why these CD’s wouldn’t work. And of course the first step was to rip the CD-ROM’s. There was 31 National Geographic, and an additional clip art disc in my box. I fired up my XP machine to have it’s hard disk give up and die on me. Luckily since I had removed the mechanical disk from the iMac G5, I had a spare SATA disk handy. I didn’t feel like fighting the XP installer, and I’m impatient, so I made a netboot.xyz bootable flash drive, and installed Debian 10 over the internet. Very nice. Now I could get down to ripping CD’s. Luckily the drive from the DVD-RAM drive disaster reads at 48x, so it took me under 4 hours to rip them all. In case you are wondering:

32 File(s) 17,894,987,776 bytes

The flash drive I used is 32Gb. I got it in a 3 pack from Tesco. I think I paid £10 for it. That means UTZOO + NatGEO all fit one of the drives. I wonder if they’ll ever offer high resolution scans on a USB drive in a fancy wooden box?

For the hell of it, I used 7zip to decompress all the ISO’s and that’s when I noticed that although the files were spread over discs there was a clean decade break between volumes from the 1900’s, and that they had a logic to them.

On the NGS_1956_1959 CD-ROM there is a hierarchy something like this:

E:\IMAGES\256I

The 2 is for the 20th century, and 56 is the year. I is the month; in this case I is September. Since we live in the future, and rendering jpeg’s is quicker than real-time, 256IC01A.JPG can be shown to be the cover. The next pattern is for the adverts, 256IA02A.JPG – 256IA30A.JPG are the into adverts. Now, we get to the interior content 256I0287.JPG – 256I0426.JPG. Next is the closing/trailer advertisements 256IZ01Z.JPG – 256IZ13Z.JPG. And Finally the back of the issue, 256IB14Z.JPG is the rear of the magazine.

So we now have some understanding of the format. Putting this into order could be done with something simple like this:

find ./ -name '2[0-9][0-9]IC[0-9]*.JPG' > interior
find ./ -name '2[0-9][0-9]IA[0-9]*A.JPG' >> interior
find ./ -name '2[0-9][0-9]I[0-9]*.JPG' >> interior
find ./ -name '2[0-9][0-9]IZ[0-9]*Z.JPG' >> interior
find ./ -name '2[0-9][0-9]IB[0-9]*Z.JPG' >> interior

Doing so makes a nice list file of what images should go in which order. I could probably use ffmpeg and painstakingly check the images for ‘pullouts/double wide’ ones, and have it stitch the rest together as a two pager ( ffmpeg -i left.jpg -i right.jpg -filter_complex hstack combined.jpg ), but that still sounds like a lot of work. Also the images are very low quality, It’s a shame they didn’t use black & white on the text, and scan the images separately, but that’d require something like PDF, and no doubt a LOT of time. Although Kodak did sponsor the set, the developer, Mindscape didn’t go with fancy PDF technology of the late 1990s, instead it’s just the blurry jpeg scans we have today.

NG July 1981

That’s when I found out about Tesseract. Running it against this one paragraph reveals:

Voyager is watching two small moons
that-seem to be playing tag as they race
around Saturn in almost the same orbit. The
trailing moon is traveling faster than the
leader, and should catch up with the leader
in January 1982 (pages 20-21). The two pre-
sumably have been playing this game for
billions of years, Through what sleight of
physics do they avoid colliding?

National Geographic, July 1981

I have to admit, that’s pretty good! And how amazing that I have a LOT of files to scan.

188553 File(s) 11,061,111,690 bytes

That is a LOT of files. Okay that’s nice, but can Tesseract read the list that I generated per issue? YES. The only thing that I’d love to see Tesseract do is create PDF’s with the scanned text embedded. OH WAIT, IT ALREADY DOES THAT!

How on earth did I not know this?

I put together a few scripts, and I was able to separate out all the images into years & months, then I created the needed list files with all the images in the correct order. It’s not a fast process I think it may take me a week or so to do this.

My CPU hates me

So far, I’m up to 1903, so I’ll update with some rough idea of when this finished.

So, the applications needed are old and obsolete Win16 or Classic MacOS needed machines, with an optical drive. Yes they still work (with emulation) on modern machines, although you still need to read the physical discs. Thankfully the images are easily mapped into the right order, and you can map them as your own. Neat!