Extracting warc files

well since the catastrauphic outage, I’ve been looking through my backups trying to see how much of my ‘vpsland’ archive I have. And it’s not so hot. The good news is the physical machine that has the last known good copy is fine. It’s just in a place I can’t get to on the other side of the world. And I’m still in exile so shipping it really isn’t an option at the moment.

On the plus side I found a warc archive, some 22GB of the 400GB worth of files. So its a start.

So what are WARC files? why do people gzip them to get maybe 1% compression? How do magnets work anyways?

web archives are single snapshots in time of a site. Sounds like a MHT but something more ‘portable’ and open standard-ish. Which means there is a million tools, none of which seem to do exactly what you want.

All I want to do is extract all my files from the WARC, but that seems to not be what most things are geared to, mostly displaying the WARC like a web page, which means clicking hundreds of thousands of files. –yikes

Thankfully warcat seems to be able to fit the bill

python3 -m warcat extract ../[email protected]015-10-04-fc233ad0-00000.warc.gz

I didn’t see any package on Ubuntu so did the pip install:

pip3 install warcat

And that seems to have done the trick.

Now to figure out how to setup some cheap storage on azure and copy this stuff up or extract over there.

spot pricing

I’m using the new ‘spot‘ pricing model, to try to keep costs down. Obviously it’s not as good as dedicated slices, but it’ll not make me broke either. And I have a lot more messing around with containers to do, trying to string together nonsense.

Launching Patreon

So yeah, it wasn’t anywhere near as tedious as I thought it would be.

https://www.patreon.com/virtuallyfun

I’ve set prices in Hong Kong dollars, but it claims it’ll re-adjust to your local currency, so it shouldn’t look too crazy as $40 HKD is about $5 USD. After they take all their fees and stuff I get like half. Which I’m not complaining, a week ago it was a big goose egg!

I’ve tried to make the tiers a little fun sounding, although with of course the most important thing we’ve learned on this fun trip is that you absolutely need either an American keyboard, or an Apple branded keyboard to enter recovery mode on a Macintosh while abroad.

As always everything will eventually wind up here. I’ll do some additional collaboration over on Patreon to give it ‘value’ even if it’s just me hitting my head against the wall.

I hate shilling for this kind of thing, ring the bell and click the thing. I’m really embarrassed that it’s come to that here, but like the ads I really don’t have much of a choice at the moment. And of course your direct support by either enabling ads on this page, or signing up just makes it all the easier to keep on through these times, as I’m aware that we are all going through some real fun challenges at the moment.

So a very big preemptive thank you very much, I’m always amazed that whenever I write anything that people read it, and I’m always shocked and amazed to get not only any kind of engagement, but an actual following. Just as I’ve been amazed to have already met so many of you in my travels around the world, through the years!

I would like very much to build on this into bigger things, create more diverse and engaging content, and have some fun doing so!

A big thank-you!

Jason

How to fix rsync slowing down over time (SOLVED)

(This is a guest post by Antoni Sawicki aka Tenox)

I often make copies of large data archives, typically many TB in size. I found that rsync transfer speed slows down over time, typically after a few GB, especially when copying large files. Eventually reaching crawl speeds of just few KB/s. The internet is littered with people asking the same question or why rsync is slow in general. There really isn’t a good answer out there, so I hope this may help.

After doing some quick profiling I found out that the main culprit was rsync's advanced delta transfer algorithm. The algorithm is super awesome for incremental updates as it will only transfer changed parts of a file instead of the whole thing. However when performing initial copy it’s not only unnecessary but gets in the way and the CPU is spinning calculating CRC on chunks that never could have changed. As such…

Initial rsync copies should be performed with -W option, for example:

$ rsync -avPW <src> <dst>

The -W or --whole-file option instructs rsync to perform full file copies and do not use delta transfer algorithm. In result there is no CRC calculation involved and maximum transfer speeds can be easily achieved.

Long term, rsync could be patched to do a full file transfer if the file doesn’t exist in destination.

Also while copying jumbo archives of many TB I don’t want to see every individual file being copied. Instead I want a percentage of the total archive size and current transfer speed in MB/s. After some experiments I arrived at this weird combo:

$ rsync -aW --no-i-r --info=progress2 --info=name0 <src> <dst>

The slap heard around the world

Even when Im trying to live under my rock, I still am somehow flooded with news that there was a slap fight.

Totally not kayfabe. Borrowed from CNN.com

No not this Will Smith Chris Rock thing, I’m talking of course about Clive Sinclair slapping Chris Curry at the Baron of Beef pub in Cambridge.

What’s the beef about?

Where’s the beef?

Clive vs Curry

As the legend goes, Curry worked under Clive, but he ran into Herman Hauser who had encouraged Curry to go his own way and make that computer of his dreams. Incised about this Clive was able to put together and rush out the Z80 before Acorn had anything ready to ship

£79.95

And more importantly it was CHEAP. You’d have thought that the zx80 would have found a larger world wide market but Commodore and Apple reigned supreme in North America.

Later that year Acorn would ship the Acorn Atom priced around £129 in kit, and £179 assembled it was a lot more expensive but granted it did have a lot more ‘computer’ in there.

In the following year Sinclair had released the ZX81, which although a larger price point also included a lot more, larger ram/rom better display and of course this was ready to ignite the coming war.

As the legend goes a TV show of all things, ‘The Might Micro, (2/3/4/5/6)’ had ignited such a storm in parliament that the Department of Industry & the BBC decided that they were going to produce programming to go along with a selected microcomputer. And that machine was the Newbury NewBrain… until it was obvious that this wasn’t going to be the machine of choice, and the selection was pushed back from the fall of 81 to the spring of 82. With the BBC being forced to open up selection to other UK computer manufacturers, both worked hard for a machine, however Curry swooped in with his new ‘BBC Micro’ (that had started working the day of the inspection) and won the contract.

1982 of course would give us the ZX Spectrum as Sinclair’s answer to what the people needed.

Oddly enough things in the long term didn’t work out for ether of them, as they both made so many missteps that they ended up ultimately shelving both of the units, with Acorn barely surviving, although their ARM processor does live on, mostly because it ended up free of any hardware platform to go along with it.

The plus isn’t plussed

There was no ZX 83 model, instead there was of course the QL for 1984. And taking on the design of the QL the Sinclair + was launched. And despite the name, it was just a 48k with a reset button and nicer keyboard. Very NON plussed. The only upgrade to the ZX would have to come from spain in the form of the 128.

The QL was 100% incompatible with the ZX. Apparently doing something like the SEGA Megadrive, by including both a 68000 and z80 was just too out of the question. Instead it was so focused on price it made the machine not serious enough for the serious business market Clive had craved so much. No socket for a 68881, and the drives being so incredibly tiny, IBM had quickly followed up the PC with the XT which allowed for a hard disk, while the QL with a single slot in no way could fit a then 5 1/4″ full heigh disk.

Although many fault the QL for having relied on the 68008 processor remember even IBM was using the 8088, with the same 8bit constraints, it’s not that it was impossible, it’s that the sleek stylized deck of the QL was just far too ahead of itself, it’d be fine for today, just look at the Pi400! I’d prefer to have one with SD cards up front but I guess I need to learn how to 3d print and make my own.

Another fault of the QL was not having the space on the motherboard to go to the full 1MB of addressable RAM like the PC, and loading the OS from disk. Having the OS in ROM was such an 8bit holdover when loading it from tape would have been useless but the PC way of loading the OS from disk was the way to go, also it far easier facilitated updating. I know the ST & Amiga also went with OS in ROM thinking it saved money but in the long term all the wedge’s of the era just limited themselves.

The real slap: in the market

The real SLAP heard around the UK

The real slap that was heard was the stagnation of both machines, and the decline of the UK computer makers. Acorn had apparently manufactured a tonne of Electron’s for Christmas but the order wasn’t actually put through because of some ‘pull back of a video game crash’ in Europe. I guess it’s the continuation of the video game crash in the USA, but as you can see the stockpile of machines to be blown out was just incredible.

And it was in 1984 that apparently Acorn had run an ad showing that Sinclair computers had a high defect rate, something that has always plagued Sinclair’s quest for low cost machines, Something that had been hand waved as a 1 year replacement policy with many teenagers abusing the machines, that led to the confrontation in the Baron of Beef along with the whooping Sinclair had unleashed on Curry. Although much of this has passed into more legend than fact, even Ruth Bramley didn’t recall anything about the event.

It’s an amazing flash in the pan, that has so many games, and so much early computer culture that was partitioned to a tiny island and for the most part in the rest of the world totally unknown. I hope to get a real Spectrum 128 one day, it sounds like a fascinating machine. Although they made a million? of them, they are quite expensive in any market place. I wonder sometimes if there is demand for a super cheap almost ‘disposable’ 8bit computer. Obviously it’d have be under £20.

Since all this UK micro computer stuff never really left the island it’s all new to me. And maybe many people outside of the UK, or surprisingly the iron curtain where zx spectrums were abundant.

footnote: I know people will say that there was some attempt at selling Sinclair Micros out of Texas with one OEM, but honestly I’ve never hear or seen of any such thing, it’s only recently as a curiosity on youtube. And they were incompatible anyways so whatever.

Also holy crap so an actor slapped another actor in a show where they backslap each other. Who cares?! Bring back Beavis and Butthead, and prime time boxing! People obviously have a thirst for this, why did the WWF’s kayfabe fade? the paywalls?

Я человек, а кто ты? / I am human, who are you?

I saw this update from sinc LAIR. I had never noticed but he’s Ukrainian. Sometimes that crazy internet of all things lets people connect. I only know he makes cool stuff.

Whenever these kinds of wars break out on bordering nations, it’s always those places where the lines arbitrary split between families, friends and communities. When politicians have their disagreements, it’s brothers and sister that are at war and pay the price.

Hopefully cooler heads can prevail, and we can get back to life.

Closing out 2021

Sometimes it snows in the tropics

Well to say the year has been a challenge would be an understatement. Perhaps the one thing that puts things into perspective is that we are all aware of the collective ‘suck’ at the moment. At the same time through the eyes of my children, employees and friends I see that despite the prevailing atmosphere of fear and uncertainty there is also the unbreakable optimism of tomorrow.

Sometimes it snows in the tropics.

No really. It does. Black swans are a thing. And sometimes all you need is the Imagineering will of a bubble machine, a fan and a tight mesh and the virtual snow will fall.

I have been so incredibly blessed these years as despite losing so much, having businesses implode, having to do layoffs, downsizing and shuttering stuff, I’ve also found new opportunities and been able to do what I can to softland the best I can, and more importantly push onwards.

I know it’s tough, especially when everyone is looking to you for the answers, and well, yeah it reminds me of an episode of STNG: Attached when Crusher realizes that Picard is human, and knows that he has to give the appearance of confidence and control despite having neither. Or as the millennials will say, fake it until you make it.

Starting new businesses in this environment has been an incredible challenge, along with maintaining the status quo. But like everything else in life, there is no ‘perfect time’ rather a window of opportunity where only the bold and crazy can and will step in and take the chance.

So while the kids enjoy their virtual snow, I’m chilling a dozen bottles of bubbley getting ready to ring in the new years.

Happy New Years!

Can you trust a man in a van with your virtual plan?

Once upon a time this was a legitimate ad. Tad from VM-limted.com. Sadly the domain has all but lapsed and finding any reference to this ad is pretty much impossible to search for. You’d think with the ‘glamp’ of vanlife and living in a van that people would love to take notes from the Microsoft VM-limited 70’s style conference van.

Nissan NV350

Instead I was getting crap like this Nissan NV350 which looks so 1960’s SciFi that it’s just unlivable and unusable. Compare that pod living thing to this incredible 1970’s themed van from VM-limited!

So comfortable!

From leather chairs, rolodexes, tube televisions to the mandatory ashtrays, wood paneling and shag carpet how could this not be a ‘work from the road’ thing today? While looking at other solutions for working on the road they seem to be so boring and unlived in that they feel about as legit as that new starwars hotel that looks like a telephone game of ‘space conflict’.

As far as I can tell it started as a print campaign in 2011 to be launched the same time as the big VMware convention (vmworld?!) back then.

2011 print ad

I do have to admire the very Atari-esque look of it. Apparently it was good enough to get some videos shot in the van:

And along with that was a TADTalk. I mirrored it on my site, and with a bit more searching I found some more and put them on archive.org.

It’s too bad the domain lapsed, and Microsoft didn’t hop onto the van-life trend with their future thinking retro 70’s conference van.

Anyways to help me google/bing it in the future Microsoft man in van selling virtualization.

Anyone else living the nomadic life? I guess with wife + kids it’s hard, but I’m sure someone is doing it.

Double Agent, WSLv2 and named pipes

So digging around an old SDK I came across an old friend, Microsoft Agent:

This was the bold new strategy of having a digital assistant that you could interact with on the desktop to help you with common tasks, and help with common issues. Oddly enough as popular a Alexa is these days, Microsoft’s attempts didn’t work out so well.

Perhaps it was the infamous Clippy of Microsoft Office infamy that left a bad taste in the world of talking animated agents. circling back to the popular Alexa perhaps Clarke/Kubrick had it right in that people prefer an omnipresent voice rather than some animated animal. Perhaps the need to animate Cortana led to it’s downfall as well.

Agent was at least an open ended platform so 3rd parties could drive the agent. However like so many other innovate things Microsoft made in the late 1990’s like Internet Explorer, Comic Chat, and Active-X, Microsoft Agent is no longer supported on Windows 10 (I didn’t even try Vista or 7). Enter Double Agent, a 32bit/64bit Active-X emulator of the old Microsoft Agent control. Download some characters for end users, and install them as Administrator, and you are in business!

How cool. Now for the fun part I took the sample ‘Hello’ from the Microsoft Agent Web SDK for C, and added a named pipe, so it simply sits on \\.\pipe\agent1 and will speak anything you send it. Pretty simple, right?

Adding WSLv2

Now one of the cool things about WSL(Windows Subsystem for Linux), is that you can run Linux commands from the CMD prompt. For example:

C:\Users\jason>wsl uname -a
Linux remlazar 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 GNU/Linux

Although there is a mechanism for sharing Unix sockets between WSL & Windows, I opted for something more casual and simpler, stdio redirection and a named pipe. I instead opted for the simple command:

@wsl x=$(fortune %1);echo ${x,,} > \\.\pipe\agent1

I should add, I found the hard way that UPPERCASE words are read letter by letter by agent, so I have to do the ‘,,’ trick to force the output to lowercase. Pipes and redirects appear to be interpreted by CMD, so I opted for environment variables instead.

So with some pipes, and a simple example I now have one of those annoying desktop agents reading jokes to me from Linux. It’s not a terribly complicated or involved program, but sometimes it doesn’t have to be. I do like how reading from a pipe is a great LCD, as anything that can open a file can send data to a named pipe, so this makes it ubiquitious.

I guess if I was more involved, I’d add timers, and have the agent walk around, sleep disappear etc etc. But I’m happy enough for it to be acting as a text to speech. The only downside is once kids see it, it’ll be the greatest thing ever. Perhaps Microsoft wasn’t wrong it’s just that the magic of an animated bird reading ‘zippy the pinhead‘ fortunes appeals more to children than to adults.

I’m sure there is books written about user interfaces, and the rise and fall, and rise again of the PDA, but I wonder what they have to say about Microsoft Agent?