WRP 4.0 Preview

(This is a guest post from Antoni Sawicki aka Tenox)

Welcome a completely new and absolutely insane mode of Web Rendering Proxy. ISMAP on steroids!

While v3.0 was largely just a port from Python/Webkit to GoLang/Chromedp, the new version is a whole new game. Previously WRP worked by walking the DOM and making a clickable imagemap out of <A HREF> nodes. Version 4.0 works by using x,y coordinates obtained from ISMAP to perform a simulated mouse click in Chrome browser. This way you can click on any element of the page. From annoying cookie warnings, to various drop down menus and even play some online games. Also pagination has been replaced with a clickable scroll bar.

Enough talking, you can watch this video:

Or download the new version and try it yourself!

Please report bugs on github.com. Thank you!

WRP 3.0 Beta ready for testing

(This is a guest post from Antoni Sawicki aka Tenox)

I have released WRP 3.0 for testing. It’s currently a browser-in-browser server rather than a true proxy, but that’s in the works. Please try it out and let me know. Usage instructions are on the main github project page.

Today using trickery I was able to login to my reddit account from Mosaic:

Update: just added the missing image quantizer so that the color number input box actually does something useful. Now you can browse porn even with 16 colors:

WRP Runs on Windows

(This is a guest post by Antoni Sawicki aka Tenox)

Thats right, the new beta version of Web Rendering Proxy runs natively on Windows. Single EXE, no libraries or dependencies required. Only Chrome Browser.

I took a Internet Explorer 1.5 for a spin today while WRP was running on my Windows 10 PC. Worked just fine.

I have added Prev/Next buttons so that you can easily “scroll” through long pages.

ISMAP support has been added, proof:

You can download a preview build on github.

Web Rendering Proxy – Overdue Status Update

(This is a guest post from Antoni Sawicki aka Tenox)

There hasn’t been a major update to WRP (Web Rendering Proxy) in 5 years or so. Some new features have been added thanks to efforts of Claunia but the whole project was mostly impeded with mass migration of the whole Internet to SSL/TLS/https. It does semi work somehow thanks to sslstrip but the whole stack is an unmaintainable pile of crap which I’m not going to update any more.

A new rewrite from scratch is well under way. This time written in GoLang and using Chrome DevTools Protocol. Things should be much more stable and future proof.

Far from complete but I have a fully functional prototype now working in just under 100 lines of code:

UPDATE 1: You can play with it if you want. Please do not submit any bug reports just yet, as this is just a development version. Note that WRP is currently not a true HTTP proxy but rather browser-in-browser. Proxy may be supported later.

UPDATE 2: As of today online setting of size, scaling and scrolling is supported. I’m specifically happy about the scrolling feature albeit it probably needs a better user input, like prev/next page.

Windows version still doesn’t work due to an upstream bug, which is probably easy to fix.

ISMAP is currently in development.

French national Internet Referral Unit tries to shutdown archive.org

Count on un-elected technocrats trying to ruin awesome resources with their kanagroo court style operations in their little backwater nonsensical nations.

Good grief.

I should step up my uploads. I did add my NetWare 3.12 disk sets, Citrix Multiuser 2.0 and NeXTSTEP 3.3 CISC stuff.

Comcast Router and SNMP

(this is a guest post by Antoni Sawicki aka Tenox)

This is a lame duck, low effort post. And if you already know it it’s obvious. However this question seem to be asked a lot on the intertubes. I hope it will help someone else, as there is no good readily available answer out there.

Problem: I wanted to have SNMP on my Comcast/Xfinity router so I can monitor current bandwidth usage.

Research:

  • Possible to enable on vanilla router? – Nope
  • Do 3rd party, Comcast compatible routers do? – Nope
  • Can you SSH or hack in to the router to do it? – Nope
  • Can you load custom / hacked firmware to do it? – Nope*

Nope or very hard / unsupported.

So is it possible at all? Yes, but with a separate device. Comcast/Xfinity routers have so called “Bridge Mode” which essentially turns them in to a DOCSIS modem without router / firewall / wifi access point.

Solution: Turn on Bridge Mode in your vanilla Comcast router and buy a WRT firmware router / access point. I got Linksys AC3200 for $99 on Amazon. Ssh to the router and run: opkg install snmpd

Done.

Using UnixWare 2 to Set Up a Web Server: A Case Study

I stumbled across this ancient page, and thought it was so dated that I had to share it.

https://support.novell.com/techcenter/articles/ana19950903.html

Bravo on MicroFocus for keeping it up.

Original UnixWare website over at Novell

It’s kind of cute they ran it all on a couple of  Zenith Z-Server LT P60E computers, which have 128MB of RAM and 5GB of hard disk space, and 2 T1’s.

Even more shocking to me is that their LAN was just 10Mbit, which for a Unix/Networking OEM you’d think they would be on the edge with new tech like 100Mbit Ethernet, or more complicated/fast technology like ATM or FDDI.  Heck even 16MB Token Ring.

Novell home page circa 1995

It’s amazing the tiny screens we had back then.  I can still remember the 800×600 debate, as people even in the later 90’s were pushing for megabyte pages, and ludicrously big real-estate.

As always it’s funny how dismissive they were of Linux:

Linux didn’t have good support and we were concerned about its ability to perform under heavy loads

And of course how they dismissed Windows NT:

Windows NT, on the other hand, handled the chores okay, but it lacks a wide developer base. As a result, few tools are available for working with Windows NT. 

Naturally the tell is that they didn’t load HTTPD directly on NetWare as it was dead with the arrival of Windows NT.  And UnixWare and commercial Unix was also dead with the utter stagnation of SYSVR4.

UnixWare home page

And the product page for UnixWare was that awkward 50’s stock images, with too much red/pink that was all to common for Novell back then.  It’s almost laughable that they considered being able to run on the i386 as being ‘portable’ but for whatever reason they never could port UnixWare to any other platforms.  When they sold off UnixWare to Caldera they failed to do anything with it, and famously turned to lawsuits to attempt to recoup their money from the botched port to the Itanium that was done with IBM’s ‘help’.

UnixWare was going to lead the charge in the post SYSV world, but it’s constantly being sold, and pushed to do different things and fit an increasingly smaller role just cemented the demise of SYSV.

And of course marginalized and almost forgotten, NeXTSTEP would go on to be the #1 commercial UNIX in the market place.

AOL Instant messaging to end service.

From tumblr!

And they killed it from Tumblr.

AIM tapped into new digital technologies and ignited a cultural shift, but the way in which we communicate with each other has profoundly changed. As a result we’ve made the decision that we will be discontinuing AIM effective December 15, 2017. We are more excited than ever to continue building the next generation of iconic brands and life-changing products for users around the world.

Kind of insane to pay way too much money for something, to just turn around and kill it.

All I know is that whatever they think they are going to do, it’ll never have the reach and recognition as AIM.  Maybe there is reverse engineered servers, like Escargot for MSN.

GopherVista

Continuing with our gopher madness, next up we have GopherVista.  And this is everything I had hoped it would be when I first learned about this project.  I had joked to another friend that it’d be cool to crawl and feed the indexer data in a manner that could basically bring AltaVista back to life.  And we laughed, and I had my utzoo search and that was that.

Except it wasn’t.

However, across the internet, Ben didn’t hear any naysay about limitations or anything to get in the way, and went ahead and wrote a crawler in go, kept the results in a sane name/db order for later sanitisation in and out of AltaVista, and after an aggressive gopher port scan of the internet, he created GopherVista, an index of the gopher-verse, running on Windows 98.

No, really, you read that right, GopherVista backends on Windows 98!

Read all about the creation of GopherVista over a Ben’s blog blog.benjojo.co.uk.

GopherVista

Keep in mind that this is a search engine, not a proxy, so it’s best to use something like Internet Explorer 4, or an ancient Netscape that supports both HTTP 1.1 & Gopher.

I have to also say, that something like this is far more cooler, and better thought out than my utzoo hack, and I’m just happy to have inspired him, but now I really want to re-think my setup, and of course index all the things….

Personal AltaVista + UTZOO reloaded

Introduction

Long before websites, during the dark ages of the BBS, on the internet there was (well it’s still there!) a distributed messaging system called usenet.  There are countless topics on just about everything that was full of all kinds of incredible conversations.  Before the walled gardens, and the ease of running individual bulletin boards, the internet had prided itself on having one big global distributed messaging system.  It was a big system, and one thing that was always taken for granted was that it was too big to save, and that whatever you put out there would probably be erased as all sites had a finite amount of very expensive disk space, and they would only keep recent articles.

But it turns out that in the University of Toronto, in the zoology department they had a tape budget, and were in fact archiving everything they could.  In all they had amassed 141 tapes spanning from  February 1981 (though these are not Usenet posts, just internal netnews University stuff) all the way up to about midnight of July 01, 1991!

While the archive was made available to a few people in 2001, it was made generally available in 2009, and then in 2011 on archive.org where I downloaded a copy of it.  There is some interesting backstory over on Dogcow land, as it took quite a bit of effort to get the data from the tapes, and then slowly released out into the wild.

As mentioned on the archive.org site:

This is a collection of .TGZ files of very early USENET posted data provided by a number of driven and brave individuals, including David Wiseman, Henry Spencer, Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen.

OK, so back a few months ago, I had setup AltaVista personal desktop search along with the UTZOO usenet archive for the purpose of using something more sophisticated than grep, but maintaining that legacy/retro feel us using outdated technology.  To recap the first challenge is that the desktop search product, is only meant to be used from the desktop of a Windows 98/NT 4.0 workstation.  It uses a super ancient version of JAVA as the webserver, and they chose to bind it to 127.0.0.1:6688 .  So the first thing to get around that was to build a stunnel tunnel allowing me to effectively connect to the webserver remotely.  And since the server assumes it’s locally I had to use Apache with mod_rewrite to setup some simple regex expressions to massage the pages into something that would be usable from a non local machine.

So with that word salad up, let’s have a brief picture!

Flow diagram

Stepping it up

On my ‘general’ hosting machine, I use haproxy to reverse proxy out multiple sites out the single address.  This is a super simple solution that allows me to have all kinds of different backends using various hosting platforms, such as Apache 1.3 on Windows NT 3.1.  So for this to work I just needed to create an altavista.superglobalmegacorp.com DNS record, and then the following in the haproxy config:

frontend named-hosts
bind 172.86.179.14:80
acl is_altavista hdr_end(host) -i altavista.superglobalmegacorp.com
use_backend altavista if is_altavista

backend altavista
balance roundrobin
option httpclose
option forwardfor
server debian8 10.0.0.18:80 check maxconn 10

So as you can see it’s really simple it looks for the string ‘altavista.superglobalmegacorp.com’ in the host header, and then sends it to the backend that has a single web server, in this case a lone Debian server, aptly named debian8 that throttles after 10 concurrent connections.

The next thing to do was generate a SSL self signed cert, which wasn’t too hard.  The stunnel installer has a profile ready to go, so it was only a matter of finding a version of OpenSSL that’ll run on NT 4.  As this isn’t public encryption I really don’t care about it using crap certs.

On the Debian server is where all the regex magic, is along with the stunnel client to connect to the NT 4.0 Workstation.

client = yes
debug = 0
cert = /etc/stunnel/stunnel.pem

[altavista]
accept = 127.0.0.1:8080
connect = 10.0.0.19:8443

Likewise on NT stunnel will need a config like this:

cert = c:\stunnel\stunnel.pem

; Some performance tunings
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1

; Some debugging stuff useful for troubleshooting
debug = 0
output = c:\stunnel\stunnel.log.txt

[altavista]
accept = 8443
connect = 127.0.0.1:6688

With the ability for the Debian box to talk to the AltaVista web server, it was now time to configure Apache.  This is the most involved part, as the html formatting by AltaVista personal search is hard coded into the java binary.  However thanks to mod_rewrite we can modify the page on the fly!  So the first thing is that I setup to virtual directories, the first one /altavista maps to the search engine, and then I added /usenet which then talks to IIS 4.0 on the Windows NT 4.0 workstation, which is just allowing read & browse to the usenet files that will need to be indexed.

#This part connect to a stunnel connection to the Altavista server
ProxyPass “/altavista” “http://localhost:8080”
ProxyPassReverse “/altavista” “http://localhost:8080”
#This connects to IIS 4.0 on the NT 4.0 machine
ProxyPass “/usenet” “http://10.0.0.19/usenet”
ProxyPassReverse “/usenet” “http://10.0.0.19/usenet”
ProxyRequests Off
RewriteEngine On

Because we mounted it on a sub directory we need to redirect the root to /altavista so I simply add:

#Redirect the root to the /altavista path.
#
RedirectMatch 301 ^/$ /altavista

To get the images to work, along with fixing the 127.0.0.1 hardcoding,  I copied them from the NT workstation onto the Apache server, then added this regex statement:

#clean up urls
Substitute “s|Copyright 1997|Copyright 2017|n”
Substitute “s|127.0.0.1:6688|altavista.superglobalmegacorp.com/altavista|n”
Substitute “s|file:///c:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|/images/|n”

And now the site is starting to work.  The most involved regex is to change the links from local text files, into a path to point to the usenet shares.  This changes the text for u:\usenet\a333\comp\33.txt into a workable URL.

Substitute “s|>u:\\\\usenet.([a-z]{1,}[0-9]{3,})\\\([0-9a-z\+\-]{1,})\\\([0-9]{1,})|—><a href=\”http://utzoo.superglobalmegacorp.com/usenet/$1/$2/$3.txt\”>[$2\] Click for article|

Naturally there is a LOT of these type of statements to match various depths, and pattern types as there is A news, B news and C news archives, plus scavenged bits.

Additionally I disabled a bunch of URL’s that would either try to alter the way the engine works, or allow the search location to change, just giving you empty results, along with altering some of the branding, as digital.com doesn’t exist anymore, and various tweeks.  The finished config file for Apache is here.

Now with that in place, I can hit my personal AltaVista search.  The next insane thing was to rename all the files from the UTZOO dump adding a .txt extension, and then re-encoding them in MS-DOS CR/LF format.  I found using ‘find -type f’ to find files, and then a simple exec to rename them into a .txt extension.  Then it was only a matter of using ZIP to compress the archives, and then transferring them to Windows NT, and running UNZIP on them with the -a flag to convert them into CR/LF ASCII files on Windows.  This took a tremendous amount of time as there are about 2.1 million files in the archive.

Now with the files on Windows, now I had to run the indexer.

Indexed in under 7 hours!

While I had originally had an IIS 4.0 instance on the same NT 4.0 Workstation serving up the result files, I thought it may make more sense to just serve them from the UTZOO mirror server I have in the same collocation so it’d be much faster, so that way only the queries are relying on servers in Hong Kong, instead of being 100% located in the United States.

So here we go, my search portal for all that ancient usenet goodness:

altavista.superglobalmegacorp.com

If you are hoping for the wealth of knowledge to be gained from people posting on usenet from 1981 to 1991 then this is your ticket.  Keep in mind that usenet being usenet, there is discussions on everyone and everything, and like all other forums before you know it it’ll end with calling people Hitler, and how the Amiga is the greatest computer ever (well it was!).  A tip when searching by year, is that people commonly wrote the year as 2 digits.  However when looking for numbers like, say Battletech 3025, it will pull up files named 3025.txt.  To prevent this just add -3025.txt to stop names like 3025.txt, or if you want to find out about the movie Bladerunner from 1982, try searching for bladrunner 82 -82.txt +review +movie.  If you have any questions, there is of course the manual with a guid on how to search.

While the story of AltaVista is somewhat interesting, but much like how Digitial screwed up the Alpha market by trying to hoard high end designs, they also didn’t set the search people free to focus on search.  And the intranet stuff was crazy expensive, look at this ad from 1996 which translate to a minimum of $10,000 USD a year to run a single search engine!  But as we all know, the distributed model of google won search and AltaVista never had a chance as it was caught up in the Compaq/HP mess then spun out to be quickly absorbed by Yahoo.

Meanwhile it appears the original owners of altavista.com, AltaVista Technology, Inc. of California, are actually still in business.  If anyone cares I’ll put the installation files, and some of the config’s in this directory.