<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>obsolete data &#8211; Virtually Fun</title>
	<atom:link href="https://virtuallyfun.com/category/obsolete-data/feed/" rel="self" type="application/rss+xml" />
	<link>https://virtuallyfun.com</link>
	<description>Fun with Virtualization</description>
	<lastBuildDate>Wed, 07 Feb 2024 14:34:47 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>Modernising the National Geographic CD-ROM collection</title>
		<link>https://virtuallyfun.com/2024/02/07/modernising-the-national-geographic-cd-rom-collection/</link>
					<comments>https://virtuallyfun.com/2024/02/07/modernising-the-national-geographic-cd-rom-collection/#comments</comments>
		
		<dc:creator><![CDATA[neozeed]]></dc:creator>
		<pubDate>Wed, 07 Feb 2024 14:34:47 +0000</pubDate>
				<category><![CDATA[cdroms]]></category>
		<category><![CDATA[obsolete data]]></category>
		<guid isPermaLink="false">https://virtuallyfun.com/?p=13899</guid>

					<description><![CDATA[I wanted to get some research into some early space flight, and look into that magical transition when everything went from Cowboys &#38; Indians to moonmen &#38; the race to space. Granted Toy Story covers cultural touchstone pretty well, reading &#8230; <a href="https://virtuallyfun.com/2024/02/07/modernising-the-national-geographic-cd-rom-collection/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><a href="https://archive.org/details/ngs-1888-1997" target="_blank" rel="noreferrer noopener"><img fetchpriority="high" decoding="async" width="1024" height="768" src="/wp-content/uploads/2024/02/IMG_0892-1024x768.jpeg" alt="" class="wp-image-13898" srcset="https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-1024x768.jpeg 1024w, https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-300x225.jpeg 300w, https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-768x576.jpeg 768w, https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-1536x1152.jpeg 1536w, https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-2048x1536.jpeg 2048w, https://virtuallyfun.com/wp-content/uploads/2024/02/IMG_0892-400x300.jpeg 400w" sizes="(max-width: 1024px) 100vw, 1024px" /></a><figcaption class="wp-element-caption">The fancy collector&#8217;s box set</figcaption></figure>



<p class="wp-block-paragraph">I wanted to get some research into some early space flight, and look into that magical transition when everything went from Cowboys &amp; Indians to moonmen &amp; the race to space.  Granted Toy Story covers cultural touchstone pretty well, reading period pieces is fun too.  I had a few CD-ROM&#8217;s containing the 1980&#8217;s National Geographic CD-ROM&#8217;s when they were sold up in decade sets, but what always escaped me was <a href="https://web.archive.org/web/19990508065427/http://www.nationalgeographic.com/cdrom/complete/" target="_blank" rel="noreferrer noopener">the fancy collectors box</a> with the whole thing.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="586" height="333" src="/wp-content/uploads/2024/02/national-geographic-box-set-web-site-cut-down.png" alt="" class="wp-image-13901" srcset="https://virtuallyfun.com/wp-content/uploads/2024/02/national-geographic-box-set-web-site-cut-down.png 586w, https://virtuallyfun.com/wp-content/uploads/2024/02/national-geographic-box-set-web-site-cut-down-300x170.png 300w, https://virtuallyfun.com/wp-content/uploads/2024/02/national-geographic-box-set-web-site-cut-down-500x284.png 500w" sizes="(max-width: 586px) 100vw, 586px" /><figcaption class="wp-element-caption">$169!!</figcaption></figure>



<p class="wp-block-paragraph">I always figured this was going to be one of those weird collectors&#8217; items that probably was under produced, over sold, and lost to the winds of time.  Looking on eBay for a 1950&#8217;s and 1960&#8217;s set .. and thinking about the 1970&#8217;s as well, and it was going to get close to £40.  Ouch.  So for the heck of it, I look for the fancy box set.</p>



<figure class="wp-block-image size-large"><a href="/wp-content/uploads/2024/02/cheap-ngs-boxes.png"><img decoding="async" width="1024" height="559" src="/wp-content/uploads/2024/02/cheap-ngs-boxes-1024x559.png" alt="" class="wp-image-13904" srcset="https://virtuallyfun.com/wp-content/uploads/2024/02/cheap-ngs-boxes-1024x559.png 1024w, https://virtuallyfun.com/wp-content/uploads/2024/02/cheap-ngs-boxes-300x164.png 300w, https://virtuallyfun.com/wp-content/uploads/2024/02/cheap-ngs-boxes-768x419.png 768w, https://virtuallyfun.com/wp-content/uploads/2024/02/cheap-ngs-boxes-500x273.png 500w, https://virtuallyfun.com/wp-content/uploads/2024/02/cheap-ngs-boxes.png 1201w" sizes="(max-width: 1024px) 100vw, 1024px" /></a><figcaption class="wp-element-caption">wow</figcaption></figure>



<p class="wp-block-paragraph">I was surprised for as much as I was going to end up paying for one or two sets, I could get the entire thing. In the legendary fancy wooden box.  I got mine, shipped for under £20. Much wow!</p>



<p class="wp-block-paragraph">Okay what is the catch?</p>



<p class="wp-block-paragraph"><strong><a href="https://www.elliott.org/problem-solved/wait-minute-national-geographic-cds-obsolete/" target="_blank" rel="noreferrer noopener">Wait a minute, these National Geographic CDs are obsolete!</a></strong> &#8211; <a href="https://www.elliott.org/author/elliott/">Christopher Elliott</a></p>



<p class="wp-block-paragraph">I always was running mine on MacOS using <a href="https://sourceforge.net/projects/cockatrice/" target="_blank" rel="noreferrer noopener">Cockatrice III</a> with the monitor resolution set to the absolutely absurd resolution of <a href="https://support.apple.com/kb/SP423?viewlocale=en_US&amp;locale=en_US" target="_blank" rel="noreferrer noopener">1152&#215;870 provided by the 1991 21&#8243;</a> monitor.  I couldn&#8217;t imagine why these CD&#8217;s wouldn&#8217;t work.  And of course the first step was to rip the CD-ROM&#8217;s.  There was 31 National Geographic, and an additional clip art disc in my box.  I fired up my XP machine to have it&#8217;s hard disk give up and die on me. Luckily since I had removed <a href="https://virtuallyfun.com/2023/11/10/another-g5-another-ssd-nightmare/" target="_blank" rel="noreferrer noopener">the mechanical disk from the iMac G5</a>, I had a spare SATA disk handy. I didn&#8217;t feel like fighting <a href="https://virtuallyfun.com/2023/07/15/installing-windows-xp-on-a-lenovo-s20/" target="_blank" rel="noreferrer noopener">the XP installer</a>, and I&#8217;m impatient, so I made a <a href="https://netboot.xyz/" target="_blank" rel="noreferrer noopener">netboot.xyz</a> bootable flash drive, and installed Debian 10 over the internet.  Very nice.  Now I could get down to ripping CD&#8217;s.  Luckily the drive from <a href="https://virtuallyfun.com/2023/10/26/dvd-ram-more-like-dvd-wrong/" target="_blank" rel="noreferrer noopener">the DVD-RAM drive disaster</a> reads at 48x, so it took me under 4 hours to rip them all.  In case you are wondering:</p>



<pre class="wp-block-code"><code>32 File(s) 17,894,987,776 bytes</code></pre>



<p class="wp-block-paragraph">The flash drive I used is 32Gb.  I got it in a 3 pack from Tesco.  I think I paid £10 for it.  That means UTZOO + NatGEO all fit one of the drives.  I wonder if they&#8217;ll ever offer high resolution scans on a USB drive in a fancy wooden box?</p>



<p class="wp-block-paragraph">For the hell of it, I used 7zip to decompress all the ISO&#8217;s and that&#8217;s when I noticed that although the files were spread over discs there was a clean decade break between volumes from the 1900&#8217;s, and that they had a logic to them.</p>



<p class="wp-block-paragraph">On the NGS_1956_1959 CD-ROM there is a hierarchy something like this:</p>



<p class="wp-block-paragraph">E:\IMAGES\256I</p>



<p class="wp-block-paragraph">The 2 is for the 20th century, and 56 is the year.  I is the month; in this case I is September.  Since we live in the future, and rendering jpeg&#8217;s is quicker than real-time, 256IC01A.JPG can be shown to be the cover.    The next pattern is for the adverts, 256IA02A.JPG &#8211; 256IA30A.JPG are the into adverts. Now, we get to the interior content 256I0287.JPG &#8211; 256I0426.JPG.  Next is the closing/trailer advertisements 256IZ01Z.JPG &#8211; 256IZ13Z.JPG.  And Finally the back of the issue, 256IB14Z.JPG is the rear of the magazine.</p>



<p class="wp-block-paragraph">So we now have some understanding of the format.  Putting this into order could be done with something simple like this:</p>



<pre class="wp-block-code"><code>find ./ -name '2&#91;0-9]&#91;0-9]IC&#91;0-9]*.JPG' > interior
find ./ -name '2&#91;0-9]&#91;0-9]IA&#91;0-9]*A.JPG' >> interior
find ./ -name '2&#91;0-9]&#91;0-9]I&#91;0-9]*.JPG' >> interior
find ./ -name '2&#91;0-9]&#91;0-9]IZ&#91;0-9]*Z.JPG' >> interior
find ./ -name '2&#91;0-9]&#91;0-9]IB&#91;0-9]*Z.JPG' >> interior</code></pre>



<p class="wp-block-paragraph">Doing so makes a nice list file of what images should go in which order.  I could probably use ffmpeg and painstakingly check the images for &#8216;pullouts/double wide&#8217; ones, and have it stitch the rest together as a two pager ( ffmpeg -i left.jpg -i right.jpg -filter_complex hstack combined.jpg ), but that still sounds like a lot of work.  Also the images are very low quality, It&#8217;s a shame they didn&#8217;t use black &amp; white on the text, and scan the images separately, but that&#8217;d require something like PDF, and no doubt a LOT of time.  Although Kodak did sponsor the set, the developer, Mindscape didn&#8217;t go with fancy PDF technology of the late 1990s, instead it&#8217;s just the blurry jpeg scans we have today.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="367" height="226" src="/wp-content/uploads/2024/02/voyager-paragraph.png" alt="" class="wp-image-13907" srcset="https://virtuallyfun.com/wp-content/uploads/2024/02/voyager-paragraph.png 367w, https://virtuallyfun.com/wp-content/uploads/2024/02/voyager-paragraph-300x185.png 300w" sizes="auto, (max-width: 367px) 100vw, 367px" /><figcaption class="wp-element-caption">NG July 1981</figcaption></figure>
</div>


<p class="wp-block-paragraph">That&#8217;s when I found out about <a href="https://tesseract-ocr.github.io/" target="_blank" rel="noreferrer noopener">Tesseract</a>.  Running it against this one paragraph reveals:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Voyager is watching two small moons<br>that-seem to be playing tag as they race<br>around Saturn in almost the same orbit. The<br>trailing moon is traveling faster than the<br>leader, and should catch up with the leader<br>in January 1982 (pages 20-21). The two pre-<br>sumably have been playing this game for<br>billions of years, Through what sleight of<br>physics do they avoid colliding?</p>
<cite>National Geographic, July 1981</cite></blockquote>



<p class="wp-block-paragraph">I have to admit, that&#8217;s pretty good!  And how amazing that I have a LOT of files to scan.</p>



<pre class="wp-block-code"><code>188553 File(s) 11,061,111,690 bytes</code></pre>



<p class="wp-block-paragraph">That is a <strong>LOT</strong> of files.  Okay that&#8217;s nice, but can Tesseract read the list that I generated per issue?  YES.  The only thing that I&#8217;d love to see Tesseract do is create PDF&#8217;s with the scanned text embedded.  <strong><span style="text-decoration: underline;">OH WAIT, IT ALREADY DOES THAT</span></strong>!</p>



<p class="wp-block-paragraph">How on earth did I not know this?</p>



<p class="wp-block-paragraph">I put together a few scripts, and I was able to separate out all the images into years &amp; months, then I created the needed list files with all the images in the correct order.  It&#8217;s not a fast process I think it may take me a week or so to do this.  </p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><a href="/wp-content/uploads/2024/02/cpu-load-using-ocr.png" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" width="850" height="660" src="/wp-content/uploads/2024/02/cpu-load-using-ocr.png" alt="" class="wp-image-13909" srcset="https://virtuallyfun.com/wp-content/uploads/2024/02/cpu-load-using-ocr.png 850w, https://virtuallyfun.com/wp-content/uploads/2024/02/cpu-load-using-ocr-300x233.png 300w, https://virtuallyfun.com/wp-content/uploads/2024/02/cpu-load-using-ocr-768x596.png 768w, https://virtuallyfun.com/wp-content/uploads/2024/02/cpu-load-using-ocr-386x300.png 386w" sizes="auto, (max-width: 850px) 100vw, 850px" /></a><figcaption class="wp-element-caption">My CPU hates me</figcaption></figure>
</div>


<p class="wp-block-paragraph">So far, I&#8217;m up to 1903, so I&#8217;ll update with some rough idea of when this finished.</p>



<p class="wp-block-paragraph">So, the applications needed are old and obsolete Win16 or Classic MacOS needed machines, with an optical drive.  Yes they still work (with emulation) on modern machines, although you still need to read the physical discs.  Thankfully the images are easily mapped into the right order, and you can map them as your own.  Neat!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://virtuallyfun.com/2024/02/07/modernising-the-national-geographic-cd-rom-collection/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
			</item>
	</channel>
</rss>
