Because I hate losing my mind with Apache

I needed to setup another WordPress for something else, and I host it on a server with no https. However of course it’s fronted by a proxy with https. And WordPress embeddeds the http into the stream, and changing that with redirects causes it to ping pong like crazy.

So this is by no means the best fix, it is *A* fix. For me. YMMV.

I had to add this into the site config from apache

<Directory /var/www/mynewsite.org/>
    Options Indexes FollowSymLinks MultiViews
    AllowOverride all
    Order allow,deny
    allow from all
    Require all granted
</Directory>

And does that work? no it sets up mod_rewrite (you did a2enamod it right?! RIGHT?!)

RewriteEngine on
Header always set Content-Security-Policy: upgrade-insecure-requests
RewriteRule ^$ /wordpress [L]

So the big thing here is updating the header. I had to search a bit to find that part, so yeah that was needed. The other part is that wordpress really wants to be in /wordpress not in the root, so a simple RewriteRule will put that in it’s place.

Good Grief.

I should explain my hosting situation more. I’ve been forced into some kind of nomadic situation, and thanks to some weird issue on Azure’s WordPress container thing my site ate itself. Like it literally destroyed the files. I had backups of course, and well either being evicted from a data centre because of the Windows CE Nethack scandal, or having hosting companies either go under with no warning, or drop me for being far too much traffic than they signed up for, I’d taken to not only owning my data but hosting it all myself.

I do enjoy the ‘nomad’ aspect of it all, but it’s a little crazy. I like to think of this as portable stealth hosting as I can blend into user traffic, and I can physically take my data with me. Of course, I have backups as well!

So to describe the setup I guess i need a fine MS Paint style drawing!

How the magic works

I have been using this kind of setup to some degree going back to when I setup the utzoo search engine back in 2017. I’d only expanded on it to host everything now. Working backwards I need to blend into residential traffic, so I need a typical 90’s like office vpn, which means PPTP. So going on lowendbox, I get some cheap VPS, install haproxy and PPTP server onto that. This way I’m basically renting out a static IP address for less than a Starbucks, but it let’s me have the freedom to move around and just look like user traffic, as the PPTP connection is bi-directional.

On the VPS I use a haproxy much like this:

	bind *:443 ssl crt /etc/haproxy/haproxy.pem
	redirect scheme https code 301 if !{ ssl_fc }
	mode http
	acl cloudflare src 173.245.48.0/20 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 141.101.64.0/18 108.162.192.0/18 190.93.240.0/20 188.114.96.0/20 197.234.240.0/22 198.41.128.0/17 162.158.0.0/15 104.16.0.0/13 104.24.0.0/14 172.64.0.0/13 131.0.72.0/22
	tcp-request content accept if cloudflare
	tcp-request content reject
	acl host_utzoo hdr(host) -i utzoo.superglobalmegacorp.com
	acl host_utzoo hdr(host) -i altavista.superglobalmegacorp.com
	acl super_virt hdr(host) -i virtuallyfun.superglobalmegacorp.com
	acl host_blog hdr(host) -i virtuallyfun.com
	acl host_blog hdr(host) -i unix.superglobalmegacorp.com
	acl host_bow  hdr(host) -i bow.superglobalmegacorp.com
	acl host_blog hdr(host) -i oldlinux.superglobalmegacorp.com
	acl host_blog hdr(host) -i macmint.superglobalmegacorp.com
	acl host_blog hdr(host) -i tuhs.superglobalmegacorp.com
	acl host_nt31 hdr(host) -i winnt31.superglobalmegacorp.com
	acl host_blog hdr(host) -i superglobalmegacorp.com

        acl blog_path path_beg -i /wordpress
        acl blog hdr(host) -i virtuallyfun.com

	redirect prefix https://virtuallyfun.com code 301 if { hdr(host) -i virtuallyfun.superglobalmegacorp.com }

        http-request set-uri %[path,regsub(/wordpress/,/)] if blog_path blog
        redirect code 301 location https://virtuallyfun.com/ if blog_path blog

	use_backend utzoo if host_utzoo
	use_backend lenovo if host_blog
	use_backend nt31 if host_nt31
	use_backend bow if host_bow


backend utzoo
	server zoo1 192.168.23.10:8080

backend lenovo
	server blog1 192.168.23.10:80

backend bow
	server bow1 192.168.23.10:8888

backend nt31
	server apache 192.168.23.5:80

This way I can use various backend servers to split up various domains on the same port. Since cloudflare fronts I don’t need any weird SSL certificate with combination domains. I also get to sneak in a redirect for the old virtuallyfun.superglobalmegacorp.com, as google had a major fit about me not having it at the top level domain. The Windows NT 3.1 server is a Qemu virtual machine running on the VPS. It just has a static page so there wasn’t much point being all that sophisticated.

I then setup a Cloudflare account, and then point my domain directly to the VPS. This also let’s me grab a certificate from Cloudflare to load onto the VPS to encrypt the conversation from them to my VPS, just as the PPTP connection is also encrypted. Another advantage of PPTP over say IPsec is that older OSs like Windows NT 4.0 support it. Not to mention with tunnels being left up so incredibly long, once again I want to blend in as user traffic, and IPsec or OpenVPN just draws too much attention. It sure did in China, and sadly that’s the model I fully expect more and more of the world to be following.

Cloudflare Full Encryption

The other advantage of Cloudflare is that they do some aggressive caching which makes hosting on a residential connection viable. In this case over the last 30 days I’ve only had to ‘upload’ 41 of the 158GB of traffic.

73% of my bandwidth is cached!

If something gets popular this can make all the difference! Again, it’s key to being stealthy.

In the same way it also allows me to host my own Exchange, or any other weird application. Which then takes me to the Apache config. I have Windows 10 on the Lenovo, and loaded up WSLv1 so I could have Windows natively integrate with the filesystem. I also like that it’s got such low overhead as WSLv1 is just a subsystem, unlike v2 being a lightweight VM, like shades of Win/OS2 back in the day. My setup takes a big page from running SWGEMU on WSLv1, where I use the Win64 version of MariaDB, and the Apache/PHP is in a Linux Ubuntu distro. The other advantage here is that as my Lenovo PPTP’s into the VPS, the Apache will by default listen on port 80. Which then leads to the whole needing to set security on the headers to allow the http to become a seamless https.

I also can have multiple things listening on different ports, just as I use haproxy again locally to split out the utzoo site onto the old Apache server with all the re-write rules, and then finally to the Windows NT 4.0 workstation with the Altavista Personal desktop search.

Windows 10 performance with the blog

The overall performance on this low end Xeon E3-1225 v2 is quite acceptable.

WSLv1 processes are directly exposed.

As I had mentioned WSLv1 being a subsystem means that the processes are transparent so task-manager can immediately show yo what is going on. The opaqueness of Virtual Machines just adds to the time to identify any issues.

And all it takes is a scheduled task to dump the MariaDB, and run an xcopy onto my OneDrive, and now I’ve got a cloud backup. No plugins needed, no scripts that will break as cloud API’s are forever being depreciated and changing. Very nice!

I do like the flexibility of this being a somewhat simple setup. Not to mention being able to have more than one WSL container for various applications, being able to go so far as to break my Apache setup into multiple instances, all being ‘l7 routed’ through an instance of haproxy either at home, or in the VPS is very nice.

Oh, sure I could mess with self signed certs, and trust myself but I just didn’t want to deal with the overhead of double encryption. Maybe it’s something I’ll revisit later.

Also sorry for the ads. 2024 is starting out…. interesting.

Personal AltaVista + UTZOO reloaded

Introduction

Long before websites, during the dark ages of the BBS, on the internet there was (well it’s still there!) a distributed messaging system called usenet.  There are countless topics on just about everything that was full of all kinds of incredible conversations.  Before the walled gardens, and the ease of running individual bulletin boards, the internet had prided itself on having one big global distributed messaging system.  It was a big system, and one thing that was always taken for granted was that it was too big to save, and that whatever you put out there would probably be erased as all sites had a finite amount of very expensive disk space, and they would only keep recent articles.

But it turns out that in the University of Toronto, in the zoology department they had a tape budget, and were in fact archiving everything they could.  In all they had amassed 141 tapes spanning from  February 1981 (though these are not Usenet posts, just internal netnews University stuff) all the way up to about midnight of July 01, 1991!

While the archive was made available to a few people in 2001, it was made generally available in 2009, and then in 2011 on archive.org where I downloaded a copy of it.  There is some interesting backstory over on Dogcow land, as it took quite a bit of effort to get the data from the tapes, and then slowly released out into the wild.

As mentioned on the archive.org site:

This is a collection of .TGZ files of very early USENET posted data provided by a number of driven and brave individuals, including David Wiseman, Henry Spencer, Lance Bailey, Bruce Jones, Bob Webber, Brewster Kahle, and Sue Thielen.

OK, so back a few months ago, I had setup AltaVista personal desktop search along with the UTZOO usenet archive for the purpose of using something more sophisticated than grep, but maintaining that legacy/retro feel us using outdated technology.  To recap the first challenge is that the desktop search product, is only meant to be used from the desktop of a Windows 98/NT 4.0 workstation.  It uses a super ancient version of JAVA as the webserver, and they chose to bind it to 127.0.0.1:6688 .  So the first thing to get around that was to build a stunnel tunnel allowing me to effectively connect to the webserver remotely.  And since the server assumes it’s locally I had to use Apache with mod_rewrite to setup some simple regex expressions to massage the pages into something that would be usable from a non local machine.

So with that word salad up, let’s have a brief picture!

Flow diagram

Stepping it up

On my ‘general’ hosting machine, I use haproxy to reverse proxy out multiple sites out the single address.  This is a super simple solution that allows me to have all kinds of different backends using various hosting platforms, such as Apache 1.3 on Windows NT 3.1.  So for this to work I just needed to create an altavista.superglobalmegacorp.com DNS record, and then the following in the haproxy config:

frontend named-hosts
bind 172.86.179.14:80
acl is_altavista hdr_end(host) -i altavista.superglobalmegacorp.com
use_backend altavista if is_altavista

backend altavista
balance roundrobin
option httpclose
option forwardfor
server debian8 10.0.0.18:80 check maxconn 10

So as you can see it’s really simple it looks for the string ‘altavista.superglobalmegacorp.com’ in the host header, and then sends it to the backend that has a single web server, in this case a lone Debian server, aptly named debian8 that throttles after 10 concurrent connections.

The next thing to do was generate a SSL self signed cert, which wasn’t too hard.  The stunnel installer has a profile ready to go, so it was only a matter of finding a version of OpenSSL that’ll run on NT 4.  As this isn’t public encryption I really don’t care about it using crap certs.

On the Debian server is where all the regex magic, is along with the stunnel client to connect to the NT 4.0 Workstation.

client = yes
debug = 0
cert = /etc/stunnel/stunnel.pem

[altavista]
accept = 127.0.0.1:8080
connect = 10.0.0.19:8443

Likewise on NT stunnel will need a config like this:

cert = c:\stunnel\stunnel.pem

; Some performance tunings
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1

; Some debugging stuff useful for troubleshooting
debug = 0
output = c:\stunnel\stunnel.log.txt

[altavista]
accept = 8443
connect = 127.0.0.1:6688

With the ability for the Debian box to talk to the AltaVista web server, it was now time to configure Apache.  This is the most involved part, as the html formatting by AltaVista personal search is hard coded into the java binary.  However thanks to mod_rewrite we can modify the page on the fly!  So the first thing is that I setup to virtual directories, the first one /altavista maps to the search engine, and then I added /usenet which then talks to IIS 4.0 on the Windows NT 4.0 workstation, which is just allowing read & browse to the usenet files that will need to be indexed.

#This part connect to a stunnel connection to the Altavista server
ProxyPass “/altavista” “http://localhost:8080”
ProxyPassReverse “/altavista” “http://localhost:8080”
#This connects to IIS 4.0 on the NT 4.0 machine
ProxyPass “/usenet” “http://10.0.0.19/usenet”
ProxyPassReverse “/usenet” “http://10.0.0.19/usenet”
ProxyRequests Off
RewriteEngine On

Because we mounted it on a sub directory we need to redirect the root to /altavista so I simply add:

#Redirect the root to the /altavista path.
#
RedirectMatch 301 ^/$ /altavista

To get the images to work, along with fixing the 127.0.0.1 hardcoding,  I copied them from the NT workstation onto the Apache server, then added this regex statement:

#clean up urls
Substitute “s|Copyright 1997|Copyright 2017|n”
Substitute “s|127.0.0.1:6688|altavista.superglobalmegacorp.com/altavista|n”
Substitute “s|file:///c:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|/images/|n”

And now the site is starting to work.  The most involved regex is to change the links from local text files, into a path to point to the usenet shares.  This changes the text for u:\usenet\a333\comp\33.txt into a workable URL.

Substitute “s|>u:\\\\usenet.([a-z]{1,}[0-9]{3,})\\\([0-9a-z\+\-]{1,})\\\([0-9]{1,})|—><a href=\”http://utzoo.superglobalmegacorp.com/usenet/$1/$2/$3.txt\”>[$2\] Click for article|

Naturally there is a LOT of these type of statements to match various depths, and pattern types as there is A news, B news and C news archives, plus scavenged bits.

Additionally I disabled a bunch of URL’s that would either try to alter the way the engine works, or allow the search location to change, just giving you empty results, along with altering some of the branding, as digital.com doesn’t exist anymore, and various tweeks.  The finished config file for Apache is here.

Now with that in place, I can hit my personal AltaVista search.  The next insane thing was to rename all the files from the UTZOO dump adding a .txt extension, and then re-encoding them in MS-DOS CR/LF format.  I found using ‘find -type f’ to find files, and then a simple exec to rename them into a .txt extension.  Then it was only a matter of using ZIP to compress the archives, and then transferring them to Windows NT, and running UNZIP on them with the -a flag to convert them into CR/LF ASCII files on Windows.  This took a tremendous amount of time as there are about 2.1 million files in the archive.

Now with the files on Windows, now I had to run the indexer.

Indexed in under 7 hours!

While I had originally had an IIS 4.0 instance on the same NT 4.0 Workstation serving up the result files, I thought it may make more sense to just serve them from the UTZOO mirror server I have in the same collocation so it’d be much faster, so that way only the queries are relying on servers in Hong Kong, instead of being 100% located in the United States.

So here we go, my search portal for all that ancient usenet goodness:

altavista.superglobalmegacorp.com

If you are hoping for the wealth of knowledge to be gained from people posting on usenet from 1981 to 1991 then this is your ticket.  Keep in mind that usenet being usenet, there is discussions on everyone and everything, and like all other forums before you know it it’ll end with calling people Hitler, and how the Amiga is the greatest computer ever (well it was!).  A tip when searching by year, is that people commonly wrote the year as 2 digits.  However when looking for numbers like, say Battletech 3025, it will pull up files named 3025.txt.  To prevent this just add -3025.txt to stop names like 3025.txt, or if you want to find out about the movie Bladerunner from 1982, try searching for bladrunner 82 -82.txt +review +movie.  If you have any questions, there is of course the manual with a guid on how to search.

While the story of AltaVista is somewhat interesting, but much like how Digitial screwed up the Alpha market by trying to hoard high end designs, they also didn’t set the search people free to focus on search.  And the intranet stuff was crazy expensive, look at this ad from 1996 which translate to a minimum of $10,000 USD a year to run a single search engine!  But as we all know, the distributed model of google won search and AltaVista never had a chance as it was caught up in the Compaq/HP mess then spun out to be quickly absorbed by Yahoo.

Meanwhile it appears the original owners of altavista.com, AltaVista Technology, Inc. of California, are actually still in business.  If anyone cares I’ll put the installation files, and some of the config’s in this directory.

Apache 1.3.4 on Windows NT 3.1

Yes, I know it’s crazy old, totally useless, but it did mostly compile.

Apache 1.3.4 running on Windows NT 3.1 Advanced Server

Apache 1.3.4 running on Windows NT 3.1 Advanced Server

Assuming it’s hasn’t been crashed or hacked it should be online here:

http://winnt31.superglobalmegacorp.com/

Unlike Serweb 0.3, Apache is HTTP 1.1 compliant, which means that I can put it behind haproxy, and enjoy the fact it doesn’t need a dedicated IP address.

Although I can’t imagine anyone wanting it, here is the binary/source and my htdoc dump.  Download it here: apache-NT-31.zip & unzip.exe

Apache console mode

Apache console mode

I had to pull out some stuff, like some of the service features, so it really only runs as a console app.

I’ve compiled it with /Zi meaning full debug and no optimization.  If you want to re-compile you’ll probably want either the Win32 SDK, or Visual C++ 1.0 32bit, and replace the headers and libraries from the Windows NT 3.5 SDK.  Much like trying to build GCC 2.6.3 on Windows NT.

Also in a silly way, thanks to Qemu, I’m now running both OS/2 & Windows NT on the same server, running Linux.

Fun with regex substitutions in Apache

Continuing from my previous post, I was now able to access my AltaVista server, however from a web browser I was unable to actually view any of the documents remotely.

In the pages though I did get the MS-DOS path to the usenet article in question:

Now how do I turn that into a URL?

Well as it turns out mod_rewrite does support regex, which in turn can do variable re-ordering!

After a bit of googling I found this page on stackoverflow, on how to convert a date between UK/US formats:

s/(\d{4})-(\d{2})-(\d{2})/\-$3-$2/

Simple, right?  So what is going on here?  The parenthesis define a variable set, and on the substitution part you can recall them with $1, $2 , $3 etc.  So using this recipe I could take something like this:

u:\b227\comp\sys\laptops\3080

and convert it into the following:

http://debian7/usenet/b227/comp/sys/laptops/3080

The code for this would look something like this:

Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href="\"http://debian7/usenet/$1/$2/$3\""]Click for article|"

Although for some reason it’s embedding the URL’s even though I specified code formatting.

Now all I had to do was install IIS 4.0 off the Option Pack CD-ROM, onto my Windows NT 4.0 workstation, and create a virtual directory of /usenet which then pointed to the U: drive where AltaVista did it’s indexing.

So to this point that gives me a config file much like this:

ServerAdmin webmaster@localhost
DocumentRoot /var/www
SSLProxyEngine On
ProxyPass "/altavista/" "https://10.12.0.16"
ProxyPassReverse "/altavista/" "https://10.12.0.16/"
ProxyRequests Off
RewriteEngine On

SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
#clean up urls
Substitute "s|127.0.0.1:6688|debian7/altavista|n"
Substitute "s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n"
#protect the page
Substitute "s|launch=app||n"
Substitute "s|?pg=config&what=init|?pg=h|n"
#fix title
Substitute "s|<IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0>|<a href=\"http://debian7/altavista\"><IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \"  BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0>|--->|n"
Substitute "s|u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\\">Click for article|"
Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\\">Click for article|"
# Need links for the u:\news097f1\b120\comp\society\futures22
Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\\">Click for article|"
Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\\">Click for article|"
# Need links for  u:\news002f1\b1\fa.poli-sci
Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z\.\-]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\\">Click for article|"

<Location /usenet/>
    ProxyPass  http://10.12.0.16/usenet/
    RewriteEngine On
    SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
    AddOutputFilterByType SUBSTITUTE text/html
</Location>

bla bla rest of the 000-default crap....

Simple right?

Searching for AltaVista

Searching for AltaVista

So now I get a nicely formatted page, I can click the mountain icon, and I jump back to home, and I can click on the articles and, because I have no extensions or MIME types to intercept it’ll just download them to my PC.  I guess I need to go through them all, convert them from UNIX format to MS-DOS, and stick a .txt extension on every single one of them.

I’m still thinking this thing is far too rickety to put on the internet, but we’ll see.

Fun with Apache, (mod_proxy, mod_rewrite), stunnel, And AltaVista Personal search

As you may remember from my prior attempt at using Altavista Search I ran out of space, and found out it only serves pages on 127.0.0.1:6688 and is pretty much hardcoded to do so.  It’s a “fine” hybrid java 1.01 application, with the bulk of it being java. Â I finally got around to setting up a VM, and unpacking all of the utzoo archives, and indexing them.  I should have done something about the IO because this took too long (KVM).

SIXTEEN HOURS!!!

SIXTEEN HOURS!!!

So, to cheat the system, I installed stunnel as a simple https to http proxy, which let me access my search VM anywhere.  However, it still embedded 127.0.0.1 in all the pages.

via stunnel

via stunnel

Enter an Apache reverse proxy to talk to stunnel to talk to AltaVista search!

First to enable a few modules:

a2enmod substitute
a2enmod proxy
a2enmod ssl
a2enmod proxy_http
a2enmod rewrite

And adding this into the config:

SSLProxyEngine On
ProxyPass “/altavista/” “https://10.12.0.16”
ProxyPassReverse “/altavista/” “https://10.12.0.16/”
ProxyRequests Off
RewriteEngine On
SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE
AddOutputFilterByType SUBSTITUTE text/html
Substitute “s/1997/2016/ni”
Substitute “s/97/16/ni”
Substitute “s|127.0.0.1:6688|debian7/altavista|n”
Substitute “s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n”
Substitute “s|launch=app||n”
Substitute “s|<a href=http://debian7/altavista/?pg=q&what=0&fmt=d|<!—|n”
Substitute “s|><strong>|—>|n”
Substitute “s|</strong></a>||n”
Substitute “s|>u:\|->u:\|n”

This let me redirect all of those requests into a VM called debian7 on the /altavista path.  I also copied the images to the apache server, and now I get something that looks correct!

Apache in the mix!

Apache in the mix!

I cut the results short… But here is a search of something simple:

About 16598 documents match your query.

About 16598 documents match your query.

I also killed all the ‘working URL’s that simply open a desktop application on the index ‘server’.  Naturally it was a personal service, but as a server this isn’t any good.  As such you can’t click on any search results now.  I need something else to figure out how to take the result blocks like “u:\b128\comp\databases\2852” and turn them into URL’s.

Also, as much as I want to re-index I would be best to cut off the headers, or most of them so the preview lines make sense.  Xref, Path, even From & Newsgroups don’t interest me.

I hate to leave it as ‘good enough’ but if anyone has a solution…. I’ll be glad to make this wonderful resource available!

I accidentally upgraded vpsland to Debian 8

So yeah, dealing with Apache 2.4 vs 2.2 was… fun.  The security Order stuff is obsolete so that was fun editing all the virtual hosts.

The key parts being:

In this example, all requests are denied.

2.2 configuration:

Order deny,allow
Deny from all

2.4 configuration:

Require all denied

In this example, all requests are allowed.

2.2 configuration:

Order allow,deny
Allow from all

2.4 configuration:

Require all granted

In the following example, all hosts in the example.org domain are allowed access; all other hosts are denied access.

Boy was that fun!

Another bit of fallout was the hosts file.  I have spamd running and suddenly I was being bombarded with this message:

Jul 25 10:15:39 cheapvps spamc[683]: connect to spamd on ::1 failed, retrying (#1 of 3): Connection refused

Well it turns out after much digging around that Debian 8 is more IPv6 ready.  The hosts file from Debian 7 was something like this:

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback

And in 8, it changed to this:

fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.0.1 localhost.localdomain localhost
# Auto-generated hostname. Please do not remove this comment.
::1 localhost ip6-localhost ip6-loopback

Needless to say, having localhost point to ::1 made it dependant on all local daemons supporting IPv6, and spamd sadly is IPv4 only.  Luckily it’s a quick fix to remove localhost from ::1, which then let’s it work again with 127.0.0.1, and now it can connect over IPv4.

Well today (August 4th, 2015) there was a critical update to Apache.  And after updating I got this fine error:

# /etc/init.d/apache2 restart

[….] Restarting apache2 (via systemctl): apache2.serviceJob for apache2.service failed. See ‘systemctl status apache2.service’ and ‘journalctl -xn’ for details.

failed!

Great.  So what does the error actually say?

# systemctl status apache2.service
* apache2.service – LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2)
Active: failed (Result: exit-code) since Tue 2015-08-04 13:52:13 HKT; 7s ago
Process: 6063 ExecStop=/etc/init.d/apache2 stop (code=exited, status=0/SUCCESS)
Process: 6427 ExecStart=/etc/init.d/apache2 start (code=exited, status=1/FAILURE)

systemd[1]: Starting LSB: Apache2 web server…
apache2[6427]: Starting web server: apache2 failed!
apache2[6427]: The apache2 configtest failed. …….
apache2[6427]: Output of config test was:
apache2[6427]: apache2: Syntax error on line 250 …y
apache2[6427]: Action ‘configtest’ failed.
apache2[6427]: The Apache error log may have more….
systemd[1]: apache2.service: control process exi…=1
systemd[1]: Failed to start LSB: Apache2 web server.
systemd[1]: Unit apache2.service entered failed …e.
Hint: Some lines were ellipsized, use -l to show in full.

Fantastic.

# apachectl configtest
apache2: Syntax error on line 250 of /etc/apache2/apache2.conf: Could not open configuration file /etc/apache2/mods-enabled/alias.load: No such file or directory
Action ‘configtest’ failed.
The Apache error log may have more information.

So, normally you’d check under modules-enabled, and link in the missing bits, right? Yeah except there is no MPM modules. Not anymore.  And yes I removed and re-installed the apache2-mpm-prefork module, to no avail.  So after much digging around it looks like the transition to 2.4 finally broke everything irrecoverably.  So I backed up the /etc/apache2 directory than ran the follwing:

apt-get purge apache2

Which then removes all the apache2 stuff from the system.  Then to finish it off, run a quick

rm -rf /etc/apache2

You did back it up, right?

now put it back in..

apt-get install apache2 libapache2-mod-php5

Now to re-enable the virtual sites.  For some reason they need to be enabled with a2ensite.  Except they don’t tell you that your sites now need to end in .conf in the /etc/apache2/sites-available (you did back it up right?)

Also if you run perl (src2html) be sure to run:

a2enmod cgi
service apache2 restart

Not to mention the joys of updating perl, and the cvsweb breaking, and I’m sure far more to break.  Oh well, at least it’ll be up to date.  That’s what I get for mixing ‘stable’ with ‘old stable’, when the local mirror out in the UK I was using moved up to 8.

shellinabox

So while browsing reddit, I came across this neat package, shellinabox.  Simply put, it runs as a process on your ‘box’ and fronts it with a javascript terminal interface.  So as long as you have a halfway modern machine with javascript support you too can just connect to a machine and run CLI based stuff.

BBS via telnet

BBS via telnet

So as a test I setup a game of tetris, and a telnet session to my BBS.

There isn’t much to ‘setup’ in the way of shellinabox, because it’s all command line driven.

/shellinabox-2.14/shellinaboxd -t -s /:LOGIN -s “/bbs:nobody:nogroup:/:/usr/bin/telnet localhost” -s /tetris:nobody:nogroup:/:/usr/games/tetris-bsd –css /shellinabox-2.14/shellinabox/white-on-black.css -b

So this will create a new web server that by default listens on TCP port 4200 which in turn uses the virtual directories / for a login, /bbs which launches telent, and /tetris which starts the BSD tetris for terminals game.  Now as many of you are aware, not all people with internet connections have the luxury of having all outbound TCP/IP ports. Even the most excellent flashterm still establishes a TCP session.  That is what makes this different is that all the traffic is done via HTTP, which means it can be proxied.  Now the real trick is having a web server do the proxing for you, so that all the user has to do is hit a special URL, and the server will proxy the request to shellinabox’s web server.

Enter Apache2’s reverse proxy!

So on my BBS’es apache config, I add in the following lines:

ProxyPass /tetris http://localhost:4200/tetris
ProxyPassReverse /tetris http://localhost:4200/tetris
ProxyPass /bbs http://localhost:4200/bbs
ProxyPassReverse /bbs http://localhost:4200/bbs

I’m not sure exactly of the specific modules to enable, but hammering away this got it to work:

a2enmod proxy
a2enmod mod_proxy
a2enmod rewrite
a2enmod mod_proxy
a2enmod proxy_http
a2enmod proxy_module
a2enmod headers
a2enmod deflate

Under my virtual server’s ‘root’ directory.  So now when you access https://virtuallyfun.com/tetris/ Apache will proxy your request into the shellinabox http server, and you’ll get…

Tetris

Tetris

So now only using HTTP you can play tetris!

So where to go from here?  I was thinking some kind of SIMH CP/M on demand thing.  There is a command line Wyse 60 emulator, so maybe that’d be fun.  I may even bring back something I had ages ago, access into a bunch of legacy systems.  This is a great ‘solution’ to enable multiplexing without having to use another software MUX.

 

WordPress spam…

So, I was looking at the start of the year about 8% of my stats was SPAM.yuck. Then something insane happened this week, it jumped to 28%.

So I crossed that point when something would have to be done!

I’ve already installed stuff to detect the spam, and it does a good overall job.  But I wanted to take it to the next level and block all traffic from the spammers! Anyone who SPAM’s probably is engaged in other nonsense that makes me not want their traffic.

Thankfully for me and this brave new era of google, I could quickly find someone has done 99% of the leg work for me right here! Thanks to Sakis’s hard work I was able to add some minor tweaks, and generate a full iptables config, flush & add the new rules, then have cron run it every few minutes.

Pretty cool stuff if I do say so myself!

Since the primary site is now offline, I’ve updated with an archive.org link. For what it’s worth, here is the meat of the article in question:

Dodging WordPress comment spammers

I admit: Allowing anyone to post comments is bad practice. Though, I’ve got my reasons to stand my ground. I’ve many times read something on a blog and to some of them I even had something to add. Could potentially help blog’s author or future visitors by sharing my own experience or request a solution to one of my problems by posting a question. Guess what? I am so lazy that I rarely go through registration procedure, just to enable me posting a comment.

I am one of those that insist dialog and discussion is always constructive as long as both ends feel like establishing it. I do not want to lose the opinion and comments of stopping-by visitors, just because I want a “safe” thing that runs on its own. But, “buts” exist. My blog is currently one month old, still it manages to receive 300+, in average, spam-oriented comments per day, while I’ve even witnessed a 1k/day.

Thank god, WordPress provides blacklist features based both on IP addresses and comment content. And it really does a good job: After messing around with your recent “spam” you can easily end up with a list that accurately detect a non constructive comment. However, you’ve not solved all your problems this way:

  • New comments still come. They are just automatically rated as spam.
  • Your database fills with garbage.
  • Your web traffic statistics are spoiled.
  • You waste bandwidth.
  • You waste CPU time.
  • If your spammer ever stop selling drugs and starts advertising flesh, all your content matching rules go away.
  • If your spammer loose interest into being a blog spammer and switch to a port-scanner, you will receive that too.

How about you refuse them a spare TCP socket? Besides, you don’t even wanna know them. All their connection attempts will end-up to void. Time for some “iptables” magic.

WordPress has already stored their IP addresses within its database. Consult that wp-config.php file you lately edit when you firstly installed WordPress, and refresh your memory on what your database name, username and password is. Mine are:


$ grep "DB_" wp-config.php

define('DB_NAME', 'mywordpress');

define('DB_USER', 'sakis');

define('DB_PASSWORD', 'myextrastrongpassword');

define('DB_HOST', 'localhost');

define('DB_CHARSET', 'utf8');

You now have to use that information into constructing this single-row command:

Check my example:

$ mysql -f -p --user=sakis mywordpress <<<"select distinct CONCAT('iptables -A INPUT -s ',comment_author_IP,'/32 -j DROP') from wp_comments where comment_approved='spam' order by 1 asc" | grep -v "^CONCAT" >> THEY_BOTHER_ME
Enter password:
$ head THEY_BOTHER_ME
iptables -A INPUT -s 113.161.128.232/32 -j DROP
iptables -A INPUT -s 117.121.208.254/32 -j DROP
iptables -A INPUT -s 118.141.141.7/32 -j DROP
iptables -A INPUT -s 118.194.1.157/32 -j DROP
iptables -A INPUT -s 119.235.27.100/32 -j DROP
...

You now have a simple recipe, named “THEY_BOTHER_ME”, ready to be executed (as root):

$ su

# . ./THEY_BOTHER_ME

Make sure you hook “THEY_BOTHER_ME” at your system’s start-up procedure and construct a cron/at job to periodically refresh it.

I’ve created a file named /etc/cron.daily/update_spammers.sh, with the following contents:

#!/bin/sh

fileloc="/etc/THEY_BOTHER_ME"

before=`cat "${fileloc}" | wc -l`
before=`echo ${before}`

cp "${fileloc}" /tmp/BOTHERS.$$

mysql -f --user=sakis --password=myextrastrongpassword mywordpress <<<"select distinct CONCAT('iptables -A INPUT -s ',comment_author_IP,'/32 -j DROP') from wp_comments where comment_approved='spam' order by 1 asc" | grep -v "^CONCAT" >> /tmp/BOTHERS.$$

sort /tmp/BOTHERS.$$ | uniq > "${fileloc}"
rm -f "/tmp/BOTHERS.$$"

. "${fileloc}"

after=`cat "${fileloc}" | wc -l`
after=`echo ${after}`

di=`expr ${after} - ${before}`
di=`echo ${di}`

printf "[%s] Spammers updated. Added %d new spammer(s) (Before: %d, After: %d)\n" "`date`" ${di} ${before} ${after}

And sadly, his original script is now offline.  This should be enough for anyone to get going on this exciting spam adventure…

ownCloud

So I was reading through a friends blog (wintellect!) and I came across this page about ownCloud…  Well I thought this was very interesting as I’ve pulled a lot of my external email mess inside (on my own Exchange 5.5 server on MS Virtual Server 2005!) .. So I like this whole idea.

I’ve got this VPS that has a few extra gigs of space, and it’d be SUPER convenient to map some drives for backups, or even back it up by copying some files..  It’s a simple AMP program setup, so I had it up and running in a few seconds.  The ‘hard’ part was mapping the drive from Vista.  Naturally it came down to reading the instructions, namely:

  1. in Services, enable the Webclient service (might be enabled already)
  2. in the Registry, change HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WebClient\Parameters\BasicAuthLevel from 1 to 2
  3. go to My Computer → Mount Network Drive
And that is about the size of it.