In some weird twist almost all the old NetBSD source code is gone. again

I don’t know what is up, but it even was such a perplexing loss that even bitsavers is now saving NetBSD.

bitsavers doesn’t normally image files from active sites, but when archive.netbsd.org was down for an extended period of time and no mirrors were found, having at least the NetBSD iso files mirrored worldwide seemed like a prudent thing to do. This is a massive storage and bandwidth burden we’ve taken on, please be considerate towards us and our mirrors. For example, people trying to run 10’s of rsyncs in parallel will be banned.

aek 20231002

It reminds me back to the work of trying to revive NetBSD 0.8, and that fun adventure, then it all showed up. I saved it and moved on, but now it seems to be my turn to save the past again.

I know its a terrible URL but here it is : My old NetBSD archive

Highlights include:

I was interested in running the first public VAX versions, and 1.1 didn’t run on VAX so it didn’t interest me. Sorry.

Otherwise, you’re welcome

Slacktivisim + IA

Can You Chip In?

I guess not surprising, Internet Archive is under fire yet again, and needs help. Again.

For more than two and a half decades, we have collected, preserved, and shared our digital cultural artifacts. Thanks to the generosity of our patrons, the Internet Archive has grown from a small preservation project into a vast library that serves millions of people each year. Our work has impacted the lives of so many of our users who value free and open access to information.

From the beginning, it was important for the Internet Archive to be a nonprofit, because it was working for the people. Its motives had to be transparent; it had to last a long time. That’s why we don’t charge for access, sell user data, or run ads, even while we offer free resources to citizens everywhere. We rely on the generosity of individuals like you to pay for servers, staff, and preservation projects.

If you can’t imagine a future without the Internet Archive, please consider supporting our work. We promise to put your donation to good use as we continue to store over 99 petabytes of data, including 625 billion webpages, 38 million books and texts, and 14 million audio recordings.

If you find our site useful, please chip in! Your support will help us build the web we deserve.

Thank you for joining me.

Brewster Kahle
Founder & Digital Librarian

And how can I support them by doing not much? BING!

Turns out that BING / Edge(ium) has this point thing for using it, and a tip jar to get Microsoft to fund IA. Every 1,000 points you tip will be $1 in real life?

Maybe it helps, I don’t know, I’d like to think it does. I figure 10,000 points let’s me feel like I’ve done something.

Yay slacktivisim.

Oh, and follow me on archive.org as neozeede! I try to upload strange and interesting things as I find them. Or remember to find them.

Peter’s Sun3 Zoo (restored)

Sometimes there is a great seemingly timeless resource on the internet, and you pull from it from time to time, make giant compilations, but never really reach out to the creator, or just archive the entire thing.

Then the unspeakable happens and it just up’s and disappears.

I never reached out to Peter Koch, to even thank him for preserving so much, or to apologise for not preserving his site, for some reason it felt like someone else would have done a better job. But then sometimes you find out you were that one person, and you didn’t do it, so you didn’t do it.

I don’t know the story, but it seems Peter did know that it was coming to and end.

May 01 2010 – Ending

Dear friends!

I have to give up my collection.

So if you’re interested in some pieces or know someone who might, please send me an e-mail.

Peter – Sun3 Zoo

So I’ll put in a call for help for the world at wide? Did anyone save anything more comprehensive than what was in archive.org, or what was in the ‘Titor Special‘?

In the meantime, the site has a new owner, and it’s been restored.

Peter’s Sun3 Zoo – sun3zoo.de

If anyone has any stories or anything preserved it’d be appreciated. There was at least a few parties, and party3, with party2 missing.

Thanks.

UK is over the edge: archive.org blocked at the telecom level

Well at first that looks weird. It pings and all so I jump to incognito mode, and…

My EE – Content unlock

Content Lock on EE helps to keep you and your children safe online by blocking 18-rated content.
We have three settings – Strict, Moderate and Off so you can choose exactly what level of security you’d like.
Please note: All new and existing accounts with Content Lock enabled have the “Moderate” setting applied by default. Content Lock is only activated when you’re using our network – not when you’re using WiFi.

And this is EE censoring archive.org . UNREAL!

Going through the SIM registration, and login….

You need a credit card to get it unlocked. Luckily my Hong Kong business card worked, as always set the zip code to ‘0000’.

Thanks over reaching corporations (at the behest of who?) from blocking me from the past?

Pathetic.

Digital.com purges DIGITAL

while finally getting around to renaming aux to aux_ for my AltaVista based search engine, I noticed that the product link, http://www.altavista.software.digital.com/search/index.htm, is suddenly not found.

Well isn’t that a shame.

Ironically in a twist of fate, I found this article, “AltaVista Search Engine History Lesson For Internet Nerds“, with a nice overview of the amazing rise, and tragic neglectful decline of AltaVista. Then what struck me was this line:

Digital was the original owner of the domain that you’re reading now; www.digital.com

Wait!? What?!

Did digital.com just purge DIGITAL’s history?

Now I feel like an idiot for not having archived the archive. Always in motion is the past, it’s a shame that DEC’s pages had to be destroyed. History in digital form, especially Digital’s is always in motion and subject to $CURRENT_YEAR.

Sad.

Excellent archive of Watcom C/C++ CD-ROMs on Archive.org

Watcom C/C++ CD-ROM collection!

I found this collection recently by accident, but it’s certainly worth sharing.  I was a SUPER big fan of Watcom C/C++ 10.0 back in the day as it includes not only so many targets, but also host setups making it a really great compiler for the day to target 16-bit MS-DOS, 32bit extended DOS,  OS/2 16bit & 32bit, Win16, Win32, a custom 32bit Windows extender, 32bit Novel NLM’s, Autocad extensions, and no doubt many more I’m forgetting.

Head on over, and just search for Watcom:

https://archive.org/search.php?query=Watcom

Or for the heck of it:

This is great for things like trying to build Duke Nuke’m 3D, and other vintage era stuff.

Personal AltaVista + UTZOO reloaded

Introduction

Long before websites, during the dark ages of the BBS, on the internet there was (well it’s still there!) a distributed messaging system called usenet.  There are countless topics on just about everything that was full of all kinds of incredible conversations.  Before the walled gardens, and the ease of running individual bulletin boards, the internet had prided itself on having one big global distributed messaging system.  It was a big system, and one thing that was always taken for granted was that it was too big to save, and that whatever you put out there would probably be erased as all sites had a finite amount of very expensive disk space, and they would only keep recent articles.

But it turns out that in the University of Toronto, in the zoology department they had a tape budget, and were in fact archiving everything they could.  In all they had amassed 141 tapes spanning from  February 1981 (though these are not Usenet posts, just internal netnews University stuff) all the way up to about midnight of July 01, 1991!

While the archive was made available to a few people in 2001, it was made generally available in 2009, and then in 2011 on archive.org where I downloaded a copy of it.  There is some interesting backstory over on Dogcow land, as it took quite a bit of effort to get the data from the tapes, and then slowly released out into the wild.

As mentioned on the archive.org site:

This is a collection of .TGZ files of very early USENET posted data provided by a number of driven and brave individuals, including David Wiseman, Henry Spencer, Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen.

OK, so back a few months ago, I had setup AltaVista personal desktop search along with the UTZOO usenet archive for the purpose of using something more sophisticated than grep, but maintaining that legacy/retro feel us using outdated technology.  To recap the first challenge is that the desktop search product, is only meant to be used from the desktop of a Windows 98/NT 4.0 workstation.  It uses a super ancient version of JAVA as the webserver, and they chose to bind it to 127.0.0.1:6688 .  So the first thing to get around that was to build a stunnel tunnel allowing me to effectively connect to the webserver remotely.  And since the server assumes it’s locally I had to use Apache with mod_rewrite to setup some simple regex expressions to massage the pages into something that would be usable from a non local machine.

So with that word salad up, let’s have a brief picture!

Flow diagram

Stepping it up

On my ‘general’ hosting machine, I use haproxy to reverse proxy out multiple sites out the single address.  This is a super simple solution that allows me to have all kinds of different backends using various hosting platforms, such as Apache 1.3 on Windows NT 3.1.  So for this to work I just needed to create an altavista.superglobalmegacorp.com DNS record, and then the following in the haproxy config:

frontend named-hosts
bind 172.86.179.14:80
acl is_altavista hdr_end(host) -i altavista.superglobalmegacorp.com
use_backend altavista if is_altavista

backend altavista
balance roundrobin
option httpclose
option forwardfor
server debian8 10.0.0.18:80 check maxconn 10

So as you can see it’s really simple it looks for the string ‘altavista.superglobalmegacorp.com’ in the host header, and then sends it to the backend that has a single web server, in this case a lone Debian server, aptly named debian8 that throttles after 10 concurrent connections.

The next thing to do was generate a SSL self signed cert, which wasn’t too hard.  The stunnel installer has a profile ready to go, so it was only a matter of finding a version of OpenSSL that’ll run on NT 4.  As this isn’t public encryption I really don’t care about it using crap certs.

On the Debian server is where all the regex magic, is along with the stunnel client to connect to the NT 4.0 Workstation.

client = yes
debug = 0
cert = /etc/stunnel/stunnel.pem

[altavista]
accept = 127.0.0.1:8080
connect = 10.0.0.19:8443

Likewise on NT stunnel will need a config like this:

cert = c:\stunnel\stunnel.pem

; Some performance tunings
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1

; Some debugging stuff useful for troubleshooting
debug = 0
output = c:\stunnel\stunnel.log.txt

[altavista]
accept = 8443
connect = 127.0.0.1:6688

With the ability for the Debian box to talk to the AltaVista web server, it was now time to configure Apache.  This is the most involved part, as the html formatting by AltaVista personal search is hard coded into the java binary.  However thanks to mod_rewrite we can modify the page on the fly!  So the first thing is that I setup to virtual directories, the first one /altavista maps to the search engine, and then I added /usenet which then talks to IIS 4.0 on the Windows NT 4.0 workstation, which is just allowing read & browse to the usenet files that will need to be indexed.

#This part connect to a stunnel connection to the Altavista server
ProxyPass “/altavista” “http://localhost:8080”
ProxyPassReverse “/altavista” “http://localhost:8080”
#This connects to IIS 4.0 on the NT 4.0 machine
ProxyPass “/usenet” “http://10.0.0.19/usenet”
ProxyPassReverse “/usenet” “http://10.0.0.19/usenet”
ProxyRequests Off
RewriteEngine On

Because we mounted it on a sub directory we need to redirect the root to /altavista so I simply add:

#Redirect the root to the /altavista path.
#
RedirectMatch 301 ^/$ /altavista

To get the images to work, along with fixing the 127.0.0.1 hardcoding,  I copied them from the NT workstation onto the Apache server, then added this regex statement:

#clean up urls
Substitute “s|Copyright 1997|Copyright 2017|n”
Substitute “s|127.0.0.1:6688|altavista.superglobalmegacorp.com/altavista|n”
Substitute “s|file:///c:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|/images/|n”

And now the site is starting to work.  The most involved regex is to change the links from local text files, into a path to point to the usenet shares.  This changes the text for u:\usenet\a333\comp\33.txt into a workable URL.

Substitute “s|>u:\\\\usenet.([a-z]{1,}[0-9]{3,})\\\([0-9a-z\+\-]{1,})\\\([0-9]{1,})|—><a href=\”http://utzoo.superglobalmegacorp.com/usenet/$1/$2/$3.txt\”>[$2\] Click for article|

Naturally there is a LOT of these type of statements to match various depths, and pattern types as there is A news, B news and C news archives, plus scavenged bits.

Additionally I disabled a bunch of URL’s that would either try to alter the way the engine works, or allow the search location to change, just giving you empty results, along with altering some of the branding, as digital.com doesn’t exist anymore, and various tweeks.  The finished config file for Apache is here.

Now with that in place, I can hit my personal AltaVista search.  The next insane thing was to rename all the files from the UTZOO dump adding a .txt extension, and then re-encoding them in MS-DOS CR/LF format.  I found using ‘find -type f’ to find files, and then a simple exec to rename them into a .txt extension.  Then it was only a matter of using ZIP to compress the archives, and then transferring them to Windows NT, and running UNZIP on them with the -a flag to convert them into CR/LF ASCII files on Windows.  This took a tremendous amount of time as there are about 2.1 million files in the archive.

Now with the files on Windows, now I had to run the indexer.

Indexed in under 7 hours!

While I had originally had an IIS 4.0 instance on the same NT 4.0 Workstation serving up the result files, I thought it may make more sense to just serve them from the UTZOO mirror server I have in the same collocation so it’d be much faster, so that way only the queries are relying on servers in Hong Kong, instead of being 100% located in the United States.

So here we go, my search portal for all that ancient usenet goodness:

altavista.superglobalmegacorp.com

If you are hoping for the wealth of knowledge to be gained from people posting on usenet from 1981 to 1991 then this is your ticket.  Keep in mind that usenet being usenet, there is discussions on everyone and everything, and like all other forums before you know it it’ll end with calling people Hitler, and how the Amiga is the greatest computer ever (well it was!).  A tip when searching by year, is that people commonly wrote the year as 2 digits.  However when looking for numbers like, say Battletech 3025, it will pull up files named 3025.txt.  To prevent this just add -3025.txt to stop names like 3025.txt, or if you want to find out about the movie Bladerunner from 1982, try searching for bladrunner 82 -82.txt +review +movie.  If you have any questions, there is of course the manual with a guid on how to search.

While the story of AltaVista is somewhat interesting, but much like how Digitial screwed up the Alpha market by trying to hoard high end designs, they also didn’t set the search people free to focus on search.  And the intranet stuff was crazy expensive, look at this ad from 1996 which translate to a minimum of $10,000 USD a year to run a single search engine!  But as we all know, the distributed model of google won search and AltaVista never had a chance as it was caught up in the Compaq/HP mess then spun out to be quickly absorbed by Yahoo.

Meanwhile it appears the original owners of altavista.com, AltaVista Technology, Inc. of California, are actually still in business.  If anyone cares I’ll put the installation files, and some of the config’s in this directory.