French national Internet Referral Unit tries to shutdown archive.org

Count on un-elected technocrats trying to ruin awesome resources with their kanagroo court style operations in their little backwater nonsensical nations.

Good grief.

I should step up my uploads. I did add my NetWare 3.12 disk sets, Citrix Multiuser 2.0 and NeXTSTEP 3.3 CISC stuff.

Comcast Router and SNMP

(this is a guest post by Antoni Sawicki aka Tenox)

This is a lame duck, low effort post. And if you already know it it’s obvious. However this question seem to be asked a lot on the intertubes. I hope it will help someone else, as there is no good readily available answer out there.

Problem: I wanted to have SNMP on my Comcast/Xfinity router so I can monitor current bandwidth usage.

Research:

  • Possible to enable on vanilla router? – Nope
  • Do 3rd party, Comcast compatible routers do? – Nope
  • Can you SSH or hack in to the router to do it? – Nope
  • Can you load custom / hacked firmware to do it? – Nope*

Nope or very hard / unsupported.

So is it possible at all? Yes, but with a separate device. Comcast/Xfinity routers have so called “Bridge Mode” which essentially turns them in to a DOCSIS modem without router / firewall / wifi access point.

Solution: Turn on Bridge Mode in your vanilla Comcast router and buy a WRT firmware router / access point. I got Linksys AC3200 for $99 on Amazon. Ssh to the router and run: opkg install snmpd

Done.

Using UnixWare 2 to Set Up a Web Server: A Case Study

I stumbled across this ancient page, and thought it was so dated that I had to share it.

https://support.novell.com/techcenter/articles/ana19950903.html

Bravo on MicroFocus for keeping it up.

Original UnixWare website over at Novell

It’s kind of cute they ran it all on a couple of  Zenith Z-Server LT P60E computers, which have 128MB of RAM and 5GB of hard disk space, and 2 T1’s.

Even more shocking to me is that their LAN was just 10Mbit, which for a Unix/Networking OEM you’d think they would be on the edge with new tech like 100Mbit Ethernet, or more complicated/fast technology like ATM or FDDI.  Heck even 16MB Token Ring.

Novell home page circa 1995

It’s amazing the tiny screens we had back then.  I can still remember the 800×600 debate, as people even in the later 90’s were pushing for megabyte pages, and ludicrously big real-estate.

As always it’s funny how dismissive they were of Linux:

Linux didn’t have good support and we were concerned about its ability to perform under heavy loads

And of course how they dismissed Windows NT:

Windows NT, on the other hand, handled the chores okay, but it lacks a wide developer base. As a result, few tools are available for working with Windows NT. 

Naturally the tell is that they didn’t load HTTPD directly on NetWare as it was dead with the arrival of Windows NT.  And UnixWare and commercial Unix was also dead with the utter stagnation of SYSVR4.

UnixWare home page

And the product page for UnixWare was that awkward 50’s stock images, with too much red/pink that was all to common for Novell back then.  It’s almost laughable that they considered being able to run on the i386 as being ‘portable’ but for whatever reason they never could port UnixWare to any other platforms.  When they sold off UnixWare to Caldera they failed to do anything with it, and famously turned to lawsuits to attempt to recoup their money from the botched port to the Itanium that was done with IBM’s ‘help’.

UnixWare was going to lead the charge in the post SYSV world, but it’s constantly being sold, and pushed to do different things and fit an increasingly smaller role just cemented the demise of SYSV.

And of course marginalized and almost forgotten, NeXTSTEP would go on to be the #1 commercial UNIX in the market place.

AOL Instant messaging to end service.

From tumblr!

And they killed it from Tumblr.

AIM tapped into new digital technologies and ignited a cultural shift, but the way in which we communicate with each other has profoundly changed. As a result we’ve made the decision that we will be discontinuing AIM effective December 15, 2017. We are more excited than ever to continue building the next generation of iconic brands and life-changing products for users around the world.

Kind of insane to pay way too much money for something, to just turn around and kill it.

All I know is that whatever they think they are going to do, it’ll never have the reach and recognition as AIM.  Maybe there is reverse engineered servers, like Escargot for MSN.

GopherVista

Continuing with our gopher madness, next up we have GopherVista.  And this is everything I had hoped it would be when I first learned about this project.  I had joked to another friend that it’d be cool to crawl and feed the indexer data in a manner that could basically bring AltaVista back to life.  And we laughed, and I had my utzoo search and that was that.

Except it wasn’t.

However, across the internet, Ben didn’t hear any naysay about limitations or anything to get in the way, and went ahead and wrote a crawler in go, kept the results in a sane name/db order for later sanitisation in and out of AltaVista, and after an aggressive gopher port scan of the internet, he created GopherVista, an index of the gopher-verse, running on Windows 98.

No, really, you read that right, GopherVista backends on Windows 98!

Read all about the creation of GopherVista over a Ben’s blog blog.benjojo.co.uk.

GopherVista

Keep in mind that this is a search engine, not a proxy, so it’s best to use something like Internet Explorer 4, or an ancient Netscape that supports both HTTP 1.1 & Gopher.

I have to also say, that something like this is far more cooler, and better thought out than my utzoo hack, and I’m just happy to have inspired him, but now I really want to re-think my setup, and of course index all the things….

Personal AltaVista + UTZOO reloaded

Introduction

Long before websites, during the dark ages of the BBS, on the internet there was (well it’s still there!) a distributed messaging system called usenet.  There are countless topics on just about everything that was full of all kinds of incredible conversations.  Before the walled gardens, and the ease of running individual bulletin boards, the internet had prided itself on having one big global distributed messaging system.  It was a big system, and one thing that was always taken for granted was that it was too big to save, and that whatever you put out there would probably be erased as all sites had a finite amount of very expensive disk space, and they would only keep recent articles.

But it turns out that in the University of Toronto, in the zoology department they had a tape budget, and were in fact archiving everything they could.  In all they had amassed 141 tapes spanning from  February 1981 (though these are not Usenet posts, just internal netnews University stuff) all the way up to about midnight of July 01, 1991!

While the archive was made available to a few people in 2001, it was made generally available in 2009, and then in 2011 on archive.org where I downloaded a copy of it.  There is some interesting backstory over on Dogcow land, as it took quite a bit of effort to get the data from the tapes, and then slowly released out into the wild.

As mentioned on the archive.org site:

This is a collection of .TGZ files of very early USENET posted data provided by a number of driven and brave individuals, including David Wiseman, Henry Spencer, Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen.

OK, so back a few months ago, I had setup AltaVista personal desktop search along with the UTZOO usenet archive for the purpose of using something more sophisticated than grep, but maintaining that legacy/retro feel us using outdated technology.  To recap the first challenge is that the desktop search product, is only meant to be used from the desktop of a Windows 98/NT 4.0 workstation.  It uses a super ancient version of JAVA as the webserver, and they chose to bind it to 127.0.0.1:6688 .  So the first thing to get around that was to build a stunnel tunnel allowing me to effectively connect to the webserver remotely.  And since the server assumes it’s locally I had to use Apache with mod_rewrite to setup some simple regex expressions to massage the pages into something that would be usable from a non local machine.

So with that word salad up, let’s have a brief picture!

Flow diagram

Stepping it up

On my ‘general’ hosting machine, I use haproxy to reverse proxy out multiple sites out the single address.  This is a super simple solution that allows me to have all kinds of different backends using various hosting platforms, such as Apache 1.3 on Windows NT 3.1.  So for this to work I just needed to create an altavista.superglobalmegacorp.com DNS record, and then the following in the haproxy config:

frontend named-hosts
bind 172.86.179.14:80
acl is_altavista hdr_end(host) -i altavista.superglobalmegacorp.com
use_backend altavista if is_altavista

backend altavista
balance roundrobin
option httpclose
option forwardfor
server debian8 10.0.0.18:80 check maxconn 10

So as you can see it’s really simple it looks for the string ‘altavista.superglobalmegacorp.com’ in the host header, and then sends it to the backend that has a single web server, in this case a lone Debian server, aptly named debian8 that throttles after 10 concurrent connections.

The next thing to do was generate a SSL self signed cert, which wasn’t too hard.  The stunnel installer has a profile ready to go, so it was only a matter of finding a version of OpenSSL that’ll run on NT 4.  As this isn’t public encryption I really don’t care about it using crap certs.

On the Debian server is where all the regex magic, is along with the stunnel client to connect to the NT 4.0 Workstation.

client = yes
debug = 0
cert = /etc/stunnel/stunnel.pem

[altavista]
accept = 127.0.0.1:8080
connect = 10.0.0.19:8443

Likewise on NT stunnel will need a config like this:

cert = c:\stunnel\stunnel.pem

; Some performance tunings
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1

; Some debugging stuff useful for troubleshooting
debug = 0
output = c:\stunnel\stunnel.log.txt

[altavista]
accept = 8443
connect = 127.0.0.1:6688

With the ability for the Debian box to talk to the AltaVista web server, it was now time to configure Apache.  This is the most involved part, as the html formatting by AltaVista personal search is hard coded into the java binary.  However thanks to mod_rewrite we can modify the page on the fly!  So the first thing is that I setup to virtual directories, the first one /altavista maps to the search engine, and then I added /usenet which then talks to IIS 4.0 on the Windows NT 4.0 workstation, which is just allowing read & browse to the usenet files that will need to be indexed.

#This part connect to a stunnel connection to the Altavista server
ProxyPass “/altavista” “http://localhost:8080”
ProxyPassReverse “/altavista” “http://localhost:8080”
#This connects to IIS 4.0 on the NT 4.0 machine
ProxyPass “/usenet” “http://10.0.0.19/usenet”
ProxyPassReverse “/usenet” “http://10.0.0.19/usenet”
ProxyRequests Off
RewriteEngine On

Because we mounted it on a sub directory we need to redirect the root to /altavista so I simply add:

#Redirect the root to the /altavista path.
#
RedirectMatch 301 ^/$ /altavista

To get the images to work, along with fixing the 127.0.0.1 hardcoding,  I copied them from the NT workstation onto the Apache server, then added this regex statement:

#clean up urls
Substitute “s|Copyright 1997|Copyright 2017|n”
Substitute “s|127.0.0.1:6688|altavista.superglobalmegacorp.com/altavista|n”
Substitute “s|file:///c:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|/images/|n”

And now the site is starting to work.  The most involved regex is to change the links from local text files, into a path to point to the usenet shares.  This changes the text for u:\usenet\a333\comp\33.txt into a workable URL.

Substitute “s|>u:\\\\usenet.([a-z]{1,}[0-9]{3,})\\\([0-9a-z\+\-]{1,})\\\([0-9]{1,})|—><a href=\”http://utzoo.superglobalmegacorp.com/usenet/$1/$2/$3.txt\”>[$2\] Click for article|

Naturally there is a LOT of these type of statements to match various depths, and pattern types as there is A news, B news and C news archives, plus scavenged bits.

Additionally I disabled a bunch of URL’s that would either try to alter the way the engine works, or allow the search location to change, just giving you empty results, along with altering some of the branding, as digital.com doesn’t exist anymore, and various tweeks.  The finished config file for Apache is here.

Now with that in place, I can hit my personal AltaVista search.  The next insane thing was to rename all the files from the UTZOO dump adding a .txt extension, and then re-encoding them in MS-DOS CR/LF format.  I found using ‘find -type f’ to find files, and then a simple exec to rename them into a .txt extension.  Then it was only a matter of using ZIP to compress the archives, and then transferring them to Windows NT, and running UNZIP on them with the -a flag to convert them into CR/LF ASCII files on Windows.  This took a tremendous amount of time as there are about 2.1 million files in the archive.

Now with the files on Windows, now I had to run the indexer.

Indexed in under 7 hours!

While I had originally had an IIS 4.0 instance on the same NT 4.0 Workstation serving up the result files, I thought it may make more sense to just serve them from the UTZOO mirror server I have in the same collocation so it’d be much faster, so that way only the queries are relying on servers in Hong Kong, instead of being 100% located in the United States.

So here we go, my search portal for all that ancient usenet goodness:

altavista.superglobalmegacorp.com

If you are hoping for the wealth of knowledge to be gained from people posting on usenet from 1981 to 1991 then this is your ticket.  Keep in mind that usenet being usenet, there is discussions on everyone and everything, and like all other forums before you know it it’ll end with calling people Hitler, and how the Amiga is the greatest computer ever (well it was!).  A tip when searching by year, is that people commonly wrote the year as 2 digits.  However when looking for numbers like, say Battletech 3025, it will pull up files named 3025.txt.  To prevent this just add -3025.txt to stop names like 3025.txt, or if you want to find out about the movie Bladerunner from 1982, try searching for bladrunner 82 -82.txt +review +movie.  If you have any questions, there is of course the manual with a guid on how to search.

While the story of AltaVista is somewhat interesting, but much like how Digitial screwed up the Alpha market by trying to hoard high end designs, they also didn’t set the search people free to focus on search.  And the intranet stuff was crazy expensive, look at this ad from 1996 which translate to a minimum of $10,000 USD a year to run a single search engine!  But as we all know, the distributed model of google won search and AltaVista never had a chance as it was caught up in the Compaq/HP mess then spun out to be quickly absorbed by Yahoo.

Meanwhile it appears the original owners of altavista.com, AltaVista Technology, Inc. of California, are actually still in business.  If anyone cares I’ll put the installation files, and some of the config’s in this directory.

URL shortners & short domains

I needed to get some business cards, and the usual thing is to use QR codes that have a tiny URL name, that then redirect to your real web site.  Easy, right?

Well most people use ‘public’ servers like bit.ly & friends.  In china many people I do business with use 1688.com .  But this got me thinking, 1688 is a FOUR letter domain, unlike any of the three letter ones that seem to be more common.  I know all the one, two and three letter domains are all gone, but are there any four letter domains?

Turns, out YES there are.

I used this site:

Domain Name Soup .com

And I was able to hammer though their UI, and find one, and register it with my usual registrar.

*This isn’t an AD, I’m not being paid to say any of this.  I was more so surprised that I could not only find a four letter domain, but it’s the initials of my wife’s business.

The best part is that I could use YOURLS, a free PHP+Mysql app to quickly and easily manage the redirects.

Fun with regex substitutions in Apache

Continuing from my previous post, I was now able to access my AltaVista server, however from a web browser I was unable to actually view any of the documents remotely.

In the pages though I did get the MS-DOS path to the usenet article in question:

Now how do I turn that into a URL?

Well as it turns out mod_rewrite does support regex, which in turn can do variable re-ordering!

After a bit of googling I found this page on stackoverflow, on how to convert a date between UK/US formats:

s/(\d{4})-(\d{2})-(\d{2})/$1-$3-$2/

Simple, right?  So what is going on here?  The parenthesis define a variable set, and on the substitution part you can recall them with $1, $2 , $3 etc.  So using this recipe I could take something like this:

u:\b227\comp\sys\laptops\3080

and convert it into the following:

http://debian7/usenet/b227/comp/sys/laptops/3080

The code for this would look something like this:

Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href="\"http://debian7/usenet/$1/$2/$3\""]Click for article|"

Although for some reason it’s embedding the URL’s even though I specified code formatting.

Now all I had to do was install IIS 4.0 off the Option Pack CD-ROM, onto my Windows NT 4.0 workstation, and create a virtual directory of /usenet which then pointed to the U: drive where AltaVista did it’s indexing.

So to this point that gives me a config file much like this:

ServerAdmin [email protected]
DocumentRoot /var/www
SSLProxyEngine On
ProxyPass "/altavista/" "https://10.12.0.16"
ProxyPassReverse "/altavista/" "https://10.12.0.16/"
ProxyRequests Off
RewriteEngine On

SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
#clean up urls
Substitute "s|127.0.0.1:6688|debian7/altavista|n"
Substitute "s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n"
#protect the page
Substitute "s|launch=app||n"
Substitute "s|?pg=config&amp;what=init|?pg=h|n"
#fix title
Substitute "s|&lt;IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0&gt;|&lt;a href=\"http://debian7/altavista\"&gt;&lt;IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0&gt;<strong>|---&gt;|n"
Substitute "s|</strong>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4\"&gt;Click for article|"
Substitute "s|&gt;u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3\"&gt;Click for article|"
# Need links for the u:\news097f1\b120\comp\society\futures\1122
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7/$8\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6/$7\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5/$6\"&gt;Click for article|"
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4/$5\"&gt;Click for article|"
# Need links for  u:\news002f1\b1\fa.poli-sci\8
Substitute "s|&gt;u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z\.\-]{1,})\\\([0-9]{1,})|---&gt;&lt;a href=\"http://debian7/usenet/$1/$2/$3/$4\"&gt;Click for article|"

&lt;Location /usenet/&gt;
    ProxyPass  http://10.12.0.16/usenet/
    RewriteEngine On
    SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
    AddOutputFilterByType SUBSTITUTE text/html
&lt;/Location&gt;

bla bla rest of the 000-default crap....

Simple right?

Searching for AltaVista
Searching for AltaVista

So now I get a nicely formatted page, I can click the mountain icon, and I jump back to home, and I can click on the articles and, because I have no extensions or MIME types to intercept it’ll just download them to my PC.  I guess I need to go through them all, convert them from UNIX format to MS-DOS, and stick a .txt extension on every single one of them.

I’m still thinking this thing is far too rickety to put on the internet, but we’ll see.

Fun with Apache, (mod_proxy, mod_rewrite), stunnel, And AltaVista Personal search

As you may remember from my prior attempt at using Altavista Search I ran out of space, and found out it only serves pages on 127.0.0.1:6688 and is pretty much hardcoded to do so.  It’s a “fine” hyrid java 1.01 application, with the bulk of it being java.  I finally got around to setting up a VM, and unpacking all of the utzoo archives, and indexing them.  I should have done something about the IO because this took too long (KVM).

SIXTEEN HOURS!!!
SIXTEEN HOURS!!!

So to cheat the system, I installed stunnel as a simple https to http proxy, which let me access my search VM anywhere.  However it still embedded 127.0.0.1 in all the pages.

via stunnel
via stunnel

Enter an Apache reverse proxy to talk to stunnel to talk to AltaVista search!

First to enable a few modules:

a2enmod substitute
a2enmod proxy
a2enmod ssl
a2enmod proxy_http
a2enmod rewrite

And adding this into the config:

SSLProxyEngine On
ProxyPass “/altavista/” “https://10.12.0.16”
ProxyPassReverse “/altavista/” “https://10.12.0.16/”
ProxyRequests Off
RewriteEngine On
SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
Substitute “s/1997/2016/ni”
Substitute “s/97/16/ni”
Substitute “s|127.0.0.1:6688|debian7/altavista|n”
Substitute “s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n”
Substitute “s|launch=app||n”
Substitute “s|<a href=http://debian7/altavista/?pg=q&what=0&fmt=d|<!—|n”
Substitute “s|><strong>|—>|n”
Substitute “s|</strong></a>||n”
Substitute “s|>u:\|->u:\|n”

This let me redirect all of those requests into a VM called debian7 on the /altavista path.  I also copied the images to the apache server, and now I get something that looks correct!

Apache in the mix!
Apache in the mix!

I cut the results short… But here is a search of something simple:

About 16598 documents match your query.
About 16598 documents match your query.

I also killed all the ‘working URL’s that simply open a desktop application on the index ‘server’.  Naturally it was a personal service, but as a server this isn’t any good.  As such you can’t click on any search results now.  I need something else to figure out how to take the result blocks like “u:\b128\comp\databases\2852” and turn them into URL’s.

Also, as much as I want to re-index I would be best to cut off the headers, or most of them so the preview lines make sense.  Xref, Path, even From & Newsgroups don’t interest me.

I hate to leave it as ‘good enough’ but if anyone has a solution…. I’ll be glad to make this wonderful resource available!