Setting the time machine for June 20th 2011

John Titor hunting Orange wine, and IBM 5100’s

No, not that time machine, this one is a rehash of the old local Wikipeida mirror.

So sadly I didn’t keep the source files as I thought they were evergreen, and yeah turns out they are NOT. But thankfully there is a 2011 set on archive.org listed as enwiki-20110620-item-1-of-2 and enwiki-20110620-item-2-of-2. Sadly there isn’t any torrents of these files, and it seems as of today the internet archive torrent servers are dead so a direct download is needed.

Getting started

You are going to need a LOT of disk space. It’s about 10GB for the downloaded compressed data, and with the pages blown out to a database it’s ~60GB. Yes it’s massive. Also enough space for a Debian 7 VM, or a lot of your time trying to decode ancient perl. Yes it really is a write only language. I didn’t bother trying to figure out why it doesn’t work instead I used netcat and a Debian 7 VM.

Thanks to trn he suggested aria2c which did a great job of downloading stuff, although one URL at a time, but that’s fine.

aria2c -x 16 -s 16 -j 16 <<URL>>

I downloaded the following files:

  • enwiki-20110620-all-titles-in-ns0.gz
  • enwiki-20110620-category.sql.gz
  • enwiki-20110620-categorylinks.sql.gz
  • enwiki-20110620-externallinks.sql.gz
  • enwiki-20110620-flaggedpages.sql.gz
  • enwiki-20110620-flaggedrevs.sql.gz
  • enwiki-20110620-image.sql.gz
  • enwiki-20110620-imagelinks.sql.gz
  • enwiki-20110620-interwiki.sql.gz
  • enwiki-20110620-iwlinks.sql.gz
  • enwiki-20110620-langlinks.sql.gz
  • enwiki-20110620-oldimage.sql.gz
  • enwiki-20110620-page.sql.gz
  • enwiki-20110620-pagelinks.sql.gz
  • enwiki-20110620-pages-articles.xml.bz2
  • enwiki-20110620-pages-logging.xml.gz
  • enwiki-20110620-page_props.sql.gz
  • enwiki-20110620-page_restrictions.sql.gz
  • enwiki-20110620-protected_titles.sql.gz
  • enwiki-20110620-redirect.sql.gz
  • enwiki-20110620-site_stats.sql.gz
  • enwiki-20110620-templatelinks.sql.gz
  • enwiki-20110620-user_groups.sql.gz

although the bulk of what you want as a single file is enwiki-20110620-pages-articles.xml.bz2, which is 7.5 GB, downloading the rest of the files is another 10GB rouding this out to 17.5GB of files to download. Yikes!

MySQL on WSLv2

I’m using Ubuntu 20.04 LTS on Windows 11, so adding MySQL is done via the MariaDB version with a simple apt-get install:

apt-get install mariadb-server mariadb-common mariadb-client mariadb-common

Installing MySQL is kind of easy although it will need to be setup to assign the pid file to the right place and set so it can write to it:

mkdir -p /var/run/mysqld
chown mysql:mysql /var/run/mysqld

Otherwise you’ll get this:

[ERROR] mysqld: Can't create/write to file '/var/run/mysqld/mysqld.pid' (Errcode: 2 "No such file or directory")

Additionally you’ll need to tell it to bind to 0.0.0.0 instead of 127.0.0.1 as we’ll want this on the network. I’m on an isolated LAN so it’s fine by me, but of course your millage may vary. For me a simple diff of the config directory is this:

diff -ruN etc/mysql/mariadb.conf.d/50-server.cnf /etc/mysql/mariadb.conf.d/50-server.cnf
--- etc/mysql/mariadb.conf.d/50-server.cnf      2021-11-21 08:22:31.000000000 +0800
+++ /etc/mysql/mariadb.conf.d/50-server.cnf     2022-03-11 10:01:45.369272200 +0800
@@ -27,7 +27,7 @@

 # Instead of skip-networking the default is now to listen only on
 # localhost which is more compatible and is not less secure.
-bind-address            = 127.0.0.1
+bind-address            = 0.0.0.0

 #
 # * Fine Tuning
@@ -43,6 +43,11 @@
 #max_connections        = 100
 #table_cache            = 64

+key_buffer_size = 1G
+max_allowed_packet = 1G
+query_cache_limit = 18M
+query_cache_size = 128M
+
 #
 # * Logging and Replication
 #

As far as I know MySQL doesn’t run on WSLv1. So people with that restriction are kind of SOL. At the same time for me, Debian 7 doesn’t run on Hyper-V so I had to run VMware Player. And well if you can’t run Hyper-V/WSLv2 then you can run it all on Debian 7 which is probably eaiser. Although you’ll probably hit some performance issues in the import that either my machine is fast enough I don’t care or the newer stuff is pre-configured for machines larger than an ISA/PCI gen1 Pentium 60.

I run mysqld manually in a window as I am only doing this adhoc not as a service. Although on a Windows 10 machine to reproduce and test this, mysqld wont run interactively, instead I had to do the ‘service mysql start’ to get it running. So I guess you’ll have to find out the hard way.

Next, be sure to create the database and a user to so this will work:

create database wikidb;
create user 'wikiuser'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON wikidb.* TO 'wikiuser'@'%' WITH GRANT OPTION;
show grants for 'wikiuser'@'%';

Something like this works well. Yes the password is password but it’s all internal so who cares. If you don’t like it, change it as needed.

With the database & user created you’ll want to make sure that you can connect from the Debian 7 machine with something like this:

mysql -h 192.168.6.10 -uwikiuser -ppassword wikidb

As I don’t think PHP 7 or whatever is modern will run the ancient MediaWiki version 1.15.5 (which I’m using).

This is my setup as I’m writing this so bear with me.

Prepping Apache

Since I have that Debian 7 VM, I used that for setting up MediaWiki. Looking at my apt-cache I believe I loaded the following modules:

  • mysql-client
  • mysql-common
  • apache2
  • apache2.2-bin
  • apache2.2-common
  • apache2-mpm-prefork
  • apache2-mpm-worker
  • apache2-utils
  • libapache2-mod-php5
  • php5-cli
  • php5-common
  • php5-mysql
  • lua5.1
  • liblua5.1

On the Apache side I have the following extension enabled:

alias authz_default authz_user deflate mime reqtimeout
auth_basic authz_groupfile autoindex dir negotiation setenvif
authn_file authz_host cgi env php5 status

Which I think is pretty generic.

I used mediawiki-1.15.5 as the basis mostly because I had started with an incomplete 2010 dump, but after finding this 2011 dump I probably should have gone with 1.16.5 or 1.17.5.. Oh well. When connecting from Debian 7 to my ‘modern’ MariaDB there is one table that needs to be updated, otherwise it’ll fail. A simple diff that needs to be applied (that was with the least amount of effort spent by me!) is this:

--- maintenance/tables.sql      2009-03-20 19:20:39.000000000 +0800
+++ /var/www/maintenance/tables.sql     2022-03-07 14:21:25.580318700 +0800
@@ -1099,7 +1099,7 @@

 CREATE TABLE /*_*/trackbacks (
   tb_id int PRIMARY KEY AUTO_INCREMENT,
-  tb_page int REFERENCES /*_*/page(page_id) ON DELETE CASCADE,
+  tb_page int,
   tb_title varchar(255) NOT NULL,
   tb_url blob NOT NULL,
   tb_ex text,

All being well and patched you can do the install! I just do a super basic install, nothing exciting. In my setup the MySQL server is on 192.168.6.10. I don’t think I changed much of anything?

And with that done if all goes well you’ll get the install completed!

If you get anything else, drop the database (the permission grants stay, because MySQL doesn’t actually drop thing associated with databases.. :shrug:.

Next in the extensions folder I grabbed Scribunto-REL1_35-04b897f.tar.gz, which is still on the extensions site. This required Lua 5.1 and the following to be appended to the LocalSetings.php

#
$wgScribuntoEngineConf['luastandalone']['luaPath'] = '/usr/bin/lua5.1';

$wgScribuntoUseGeSHi = true;
$wgScribuntoUseCodeEditor = true;
#

Keep in mind the original extensions I used are not, and appear to not have been archived, so yeah.

Doing the pages.xml import

You can find the version 0.5 media wiki import script on archive.org. Obviously check the first 5-10 lines of the decompressed bz2 file to see what version you have if you are deviating and look around IA to time travel to see if there is a matching one. I have no idea about modern ones as this is hard enough trying to reproduce an old experiment.

First you need to make some files to setup the pre-post conditions of the insert. It’s about 11,124,050 pages, give or take.

pre.sql

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
BEGIN;

post.sql

COMMIT;
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

Running the actual import

I’m assuming that 192.168.6.33 is the Debian 7 machine, 192.168.6.10 is the Windows 11 machine.

On the machine with the data:

netcat 192.168.6.33 9909 < enwiki-latest-pages-articles.xml.bz2

On the machine that can run the mwimport script:

netcat -l -p 9909 | bzip2 -dc | ./mwimport-0.5.pl | netcat 192.168.6.10 9906

And finally on the MySQL machine:

(cat pre.sql; netcat -l -p 9906 ; cat post.sql) | mysql -f --default-character-set=utf8 wikidb

Since I’m using WSLv2 the Windows firewall may screw stuff up so add a rule with netsh (as Administrator CMD prompt)

netsh interface portproxy add v4tov4 listenaddress=192.168.6.10 listenport=3306 connectaddress=172.24.167.66 connectport=3306
netsh interface portproxy add v4tov4 listenaddress=192.168.6.10 listenport=9906 connectaddress=172.24.167.66 connectport=9906

On my setup it takes about 2.5 hours to load the database, which will be about 51GB.

11340000 pages (1231.805/s),  11340000 revisions (1231.805/s) in 9206 seconds

The savvy among you may notice the -f flag to the mysql parser. And yes that is because there *will* be errors during the process.

I’m not sure what how or what to do about it, but without the -f (force) flag the process will stop around the 2 million row mark. Doing it forced allows the process to continue.

With that done I get the following tallies…

MariaDB [(none)]> SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES    WHERE TABLE_SCHEMA = 'wikidb' and
table_rows > 0;
+---------------+------------+
| table_name    | table_rows |
+---------------+------------+
| interwiki     |         85 |
| objectcache   |         10 |
| page          |   10839464 |
| revision      |   11357659 |
| text          |   14491759 |
| user_groups   |          2 |
+---------------+------------+
9 rows in set (0.002 sec)

If all of this worked (amazing!) then search for something like 1001 and be greeted with:

1001: a non odyssey

MySQL disappointments

So with this in place, having some 51GB laying around just seemed lame. Using WSLv2 I setup a compressed folder on NTFS and moved the data directory into there and it gets it down to a somewhat more manageable 20GB. Since the data doesn’t change I had a better idea, SquashFS. Well it compresses down to 12GB, HOWEVER for the life of me I can’t find anything concrete on using a read only backing store to MySQL. Even general mediawiki stuff seems to want to write to all the tables, I guess it’s index searching?! Insane! And it appears MySQL can only use single file storage units per table? Yeah this isn’t MSSQL with stuff like a database from CD-ROM with the log on a floppy. I tried doing a union overlay filesytem but it makes a 100% copy of a file that changes. That’s not good. I guess using qemu-img for a compressed qcow2 with a writable diff file could hide the read only compressed backing store, but I’ve already lost interest.

Maybe it’s just me, but it seems like there should be a way to write logs/updates/scratch to a RW place, and keep the majority of the data read-only (and highly compressed).

Why doesn’t stuff format correctly

There seems to be a lot of formatting nonsense going on, I probably should step up to mediawiki 1.17. And I’ll add in loading the other SQL tables since they are straight up inserts. Also the extensions I know I loaded don’t seem to exist in any form anymore, and the images I snapshotted of the install are all long gone. It’ll require more diving around.

Я человек, а кто ты? / I am human, who are you?

I saw this update from sinc LAIR. I had never noticed but he’s Ukrainian. Sometimes that crazy internet of all things lets people connect. I only know he makes cool stuff.

Whenever these kinds of wars break out on bordering nations, it’s always those places where the lines arbitrary split between families, friends and communities. When politicians have their disagreements, it’s brothers and sister that are at war and pay the price.

Hopefully cooler heads can prevail, and we can get back to life.

2,000 monthly downloads!

Well this is a bit ambiguous. As Im waking up to check emails I get this notice:

Congratulations! Ancient UNIX/BSD emulation on Windows has just been recognized with the following awards by SourceForge:

Community Choice
SourceForge Favorite

These honors are awarded only to select projects that have reached significant milestones in terms of downloads and user engagement from the SourceForge community.

This is a big achievement, as your project has qualified for these awards out of over 500,000 open source projects on SourceForge. SourceForge sees nearly 30 million users per month looking for, and developing, open source software. These award badges will now appear on your project page, and the award assets can be found in your project admin section.

-sourceforge email

So yeah, and here we are:

Nothing like standing on the backs of giants!

Naturally ready to run favorites include:

And of course for the DIY enthusiasts:

Honorable mention goes to the 4.3BSD UWISC enthusiast that downloaded Apache, AberMUD, and lynx!.

So I have a splitting headache.

You’re welcome.

So since Im going to share my pain, did you know that Windows 11 updated notepad?

YES.

And it’s BROKEN.

Here I have a simple file, It’s very MS-DOS like and I want to change it to Unix. Yes I could use SED but I have NOTEPAD so let’s change the backslash to a forward slash. Something notepad.exe could do going back to 1985.

And it’s become a bunch of spaces. Great. Check the search/replace and yeah it’s gone and done it’s own thing.

And you may think wow thats broken but come one it’s not *that* bad. And you’d be wrong. So very very wrong.

My next attempt got me this.

I don’t even know what the hell happened. I guess I should be happy the slashes changed, but at what cost? AT WHAT COST?!

My god Microsoft how could you fuck up notepad this badly?

And yes, I blame Canada!

Manually migrating NT 4.0 on VMware ESX to Hyper-V or what is a ‘flat’ vmdk anyways?

So due to recent economic events I’m having to consolidate all my VM’s back to the office I’m currently renting. I had a fancy 1gig internet connection installed and I’m still under contract for a year. Before the c00f it made sense as I did a lot out of that office and was getting ready to do something fun and big. I had planned on making a cloud service, I’d bought a bunch of Xeon boards, and started the initial build of my cloud to shop around but then the world ended the following weekend. As they say, bad timing.

So as a fan of old junk I still have some NT 4.0 stuff, and it’d been running on VMware for years, no issues everything being great. But I need to do double+ duty at the moment and to make it easier than trying to get GPU passthru working, I’m just going with Hyper-V on the Windows 10 desktops that I have running. May as well make people doubly useful!

In some idea of ‘performance’ I had converted all the virtual disks to ‘flat’ VMDK’s and never thought twice about it as it worked, and all was well.

Naturally to start with I uninstall VMware Tools while running under ESXi and shut down the VMs.

Well after rsync‘ing my disks back, I converted them with qemu-img and got this weird error that my VMDK’s were not VMDK’s. They are infact FLAT disk images. With really screwed up geometry that prevented both qemu and Hyper-V from mounting the raw converted disk images.

So first let’s verify the data:

[email protected]:/mnt/d/virtual/USENET-AltaVista# sfdisk -d USENET-AltaVista-flat.vmdk
label: dos
label-id: 0x8058e639
device: USENET-AltaVista-flat.vmdk
unit: sectors

USENET-AltaVista-flat.vmdk1 : start=          63, size=     4096512, type=7, bootable

And sure enough yeah it’s like a typical DOS disk with the start 63 sectors in. So to mount this under Linux (WSLv2 too!) we need to tell the loop driver the offsets, which is the start and size * 512 or:

# mount -o loop,offset=32256,sizelimit=2097414144 USENET-AltaVista-flat.vmdk /mnt

And all is good. Yes even a type 7 for HPFS/NTFS it mounted find and the data is there.

Now the ‘fix’ was an old one from back in the day, when moving stuff around and things get goofed you can try to xcopy and permissions always get messed up or cheat, and just use another NT installation and format a floppy disk and copy the following system files to it:

  • ntldr
  • ntdetect.com
  • boot.ini

In my case that’s all I needed to do, I re-ran qemu-img to convert from raw to vpc disk images:

qemu-img convert -f raw -O vpc USENET-AltaVista-flat.vmdk USENET-AltaVista-flat.vhd

And setup Hyper-V to boot my virtual diskette first, and in no time my NT was back up and running.

Naturally be sure to install the legacy network adapter for the VM, and re-configure NT for the DECchip 21140 adapter.

DECchip 21140

Dont’ forget to re-run service pack 6, and the update. Since these disks & VMs were pre-installed I didn’t have to mess with the “CompatibilityForOlderOperatingSystemsEnabled” flag. Although that was quite the fun adventure at the time.

In my case there was some IP addresses to change, but it’s back online with minimal effort which is always fine. Hyper-V doesn’t have any real integration stuff for old Windows so it’s pretty much a set it an forget it thing, or use Terminal Server for remote access.

So yes, many of the hosted things I have are down. I know. Yes it sucks. And yes I think the disk I put this on at the moment kind of sucks too. It’s been super cold here lately and I didn’t want to be exposed out there riding around getting soaked in the high winds so I’ll keep shuffling stuff later. But for now I got to save some hosting fees. And things like the gopher are dead. for the moment.

Ruffle the Flash feathers!

The End of the World!

So it’s the END OF THE WORLD, and sadly that means that all the old media of the 1st gen ‘rich’ web experence is all gone with the long end of Adobe flash. At one point Flash was not only ubiquitious but all sponsored a C/C++ compiler but that stuff sadly won’t work.

So yeah, sad. However, check out ruffle! Naturally it’s a chromium extension, but everything is chrome now so it’ll work fine. It plays many of the early flash type stuff with little to no issues!

Currently Ruffle only supports games written in ActionScript 1 and 2. This includes all games before 2006 and only some games released later.

Currently Ruffle only supports games written in ActionScript 1 and 2. This includes all games before 2006 and only some games released later.

Unfortunately, your content was using Actionscript 3, which Ruffle does not yet support.

From the FAQ

It’s hard to think it’s been over 20 years since the whole ‘eStudio‘ thing, but it’s cute to keep it going. Although we are at the point where you can run Windows 2000 in javascript so there is that brute force path…

So sure it’s not perfect but what is? Kitty Cat Dance, Dancing Colin, Maiyahi, it’s a MAD WORLD!!

Flash on!

VIDEO Capture USB 2.0 Video Adapter with Audio

Capture and deit High – quality Video and audio!

So I wanted to capture some composite PAL signals, and well yeah I have a fancy capture card but it’s only HDMI of all things. NO VGA, EGA/CGA and sure no composite. So I headed down to Sun CHeong Computer Co. Ltd. 246 Apliu Street Shum Shai Po, and picked up one of these.

The bundled software, honestech VHS to DVD 3.0, is pure garbage. Basically it always sets for NTSC and never works. The program to change the input style does nothing either. terrible. But the honestech TVR 2.5 (37MB download!) however does work.

As a plus it lets you set PAL or NTSC

Although it’s not all that great, I have a webcam, and toggling between the display inputs can trigger a bluescreen.

So yeah it’s not so great.

I can’t really comment on the quality of the capture as it turns out I don’t have any RCA cables, so this is me running a jumper wire to the device directly. This is FAR from ideal but here we go:

So yeah…. It’s probably me, but there you go. at $99 HKD ($13 USD?) it’s not great. Actually its damned near temperamental. But its better than nothing.

Otherwise, MEH.

The arrogance of Silicon Valley is astounding: or the death of 6to4

For many people across the world, and I suspect the majority the deathmarch rollout of IPv6 has been about as obtainable today as it was in the early 00’s. Absolutely no traction from ISP’s. Where I live in Hong Kong, none of the residential or even commercial connections I have access to have native v6. Instead there was this fantastic option of tunneling IPv6 into IPv4, using a technology called 6to4 which gave everyone with a registered IPv4 address suddenly had 65535 networks to build out their own massive IPv6 deployment.

Simply put 6to4 put the individual onto the map for a NAT’less IPv6 world. 6to4 allowed two IPv6 hosts to talk to each other through the IPv6 Internet backbone, with zero changes on the Internet required. It just worked.

And of course Silicon Valley knows best, and decided that this network democratization must be stopped. Power to the People is the anthesis of the megacorps.

Google DNS Primary: 2001:4860:4860::8888
Google DNS Secondary: 2001:4860:4860::8844
Cloudflare DNS Primary: 2606:4700:4700::1111
Cloudflare DNS Secondary: 2606:4700:4700::1001
Quad9 DNS Primary: 2620:fe::fe
Quad9 DNS Secondary: 2620:fe::fe:9

This is a list of some popular ‘common’ IPv6 DNS servers. Windows 10/11 (probably 8/8.1 but who uses that?!) are not only IPv6 capable but actually IPv6 native, with a preference for the IPv6 DNS servers.

TP-Link Wireless N Router WR840N choices

I have this low end TP-Link Wireless N Router WR840N router, as where I live the maximum speed is 30Mbit/10Mbit DSL. There was no point in buying anything crazy expensive. My ISP has zero IPv6 deployment. The only way I can participate is buying a tunnel, or using 6to4. So I’d been using 6to4 for a while, and things have been great. But the last while it’s been super downhill. Sadly the firmware doesn’t give an option to force IPv6 DNS, but it automatically chooses Google.

C:\Users\neozeed>ping 2001:4860:4860::8888

Pinging 2001:4860:4860::8888 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 2001:4860:4860::8888:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

And sure enough I’m getting massive timeouts, and the web had basically become utterly unusable. Fantastic.

I’d even gone through the steps of creating a local DNS server and having it VPN to the United States thinking that’d help me, as the DNS errors felt like the encroaching Great Firewall of China. However the source of all my problems just turned out to be out of touch Silicon Valley arrogance.

rfc7526 (ietf.org) Deprecating the Anycast Prefix for 6to4 Relay Routers

This is where they chose to kill over IPv6 for the masses, because local firewalls work as expected.

Authors' Addresses

   Ole Troan
   Cisco
   Oslo
   Norway

   EMail: [email protected]

Yeah what a surprise. And of course Google cut off IPv6. These tech giant oligarchs are not your friends.

The good news is that the other ISP’s Cloudflare & Cloud9 still honor 6to4.

Configuring IPv6 DNS on Windows 11

Windows 11 supports DNS over HTTPS, so you just need to enable it. I’m hardwired so under the settings -> network then -> Ethernet for me, maybe Wi-Fi for you?

Then just hit Edit over the DNS server assignment:

Then go ahead and pick a NON GOOGLE DNS service, and select DNS over HTTPS for the ‘ultra secure’ wave of the future.

And now your DNS will work. YAY.

C:\Users\jason>nslookup
Default Server:  one.one.one.one
Address:  2606:4700:4700::1111

> google.com
Server:  one.one.one.one
Address:  2606:4700:4700::1111

Non-authoritative answer:
Name:    google.com
Addresses:  2404:6800:4001:800::200e
          172.217.174.174

Of course you won’t be able to connect to anything from Google over IPv6, but that is the price you pay for not living in the precious Silicon Valley tech bubble.

Personally I think it’s a good thing when elitists lock themselves away from the world, and decrease their relevancy to everyone.

Obviously the end game won’t be some magical rollout of IPv6 over Asia, rather it’ll be the end of IPv6. As always the problems stemmed from the backbone, even the 512MB limit of the cisco 7200 was overcome, but NAT got around the limitations of the fixed and exhausted IPv4 network. Too bad they had to kill it, but of course it’s just because random people could just host stuff on their own network, and well network democratization isn’t what cisco et all is all about.