Extracting warc files

Posted on October 13, 2022 by neozeed

well since the catastrauphic outage, I’ve been looking through my backups trying to see how much of my ‘vpsland’ archive I have. And it’s not so hot. The good news is the physical machine that has the last known good copy is fine. It’s just in a place I can’t get to on the other side of the world. And I’m still in exile so shipping it really isn’t an option at the moment.

On the plus side I found a warc archive, some 22GB of the 400GB worth of files. So its a start.

So what are WARC files? why do people gzip them to get maybe 1% compression? How do magnets work anyways?

web archives are single snapshots in time of a site. Sounds like a MHT but something more ‘portable’ and open standard-ish. Which means there is a million tools, none of which seem to do exactly what you want.

All I want to do is extract all my files from the WARC, but that seems to not be what most things are geared to, mostly displaying the WARC like a web page, which means clicking hundreds of thousands of files. –yikes

Thankfully warcat seems to be able to fit the bill

python3 -m warcat extract ../user-giddythrill28@vpsland.superglobalmegacorp.com-old-install-2015-10-04-fc233ad0-00000.warc.gz

I didn’t see any package on Ubuntu so did the pip install:

pip3 install warcat

And that seems to have done the trick.

Now to figure out how to setup some cheap storage on azure and copy this stuff up or extract over there.

I’m using the new ‘spot‘ pricing model, to try to keep costs down. Obviously it’s not as good as dedicated slices, but it’ll not make me broke either. And I have a lot more messing around with containers to do, trying to string together nonsense.

Windows scp to remote machines with spaces in the directories

Posted on October 6, 2022 by neozeed

Well one nice thing about Windows 10 is that it has a built in ssh/scp client! Although telnet is optional, I get that it’s insecure but jeez what is a retro user to do?

Anyways the subject at hand is copying files from somewhere that has spaces in the path. In this case I need a copy of OS X Snow Leopard from my Mac Pro cylinder to this junk Fujitsu Celsius. I’m still having USB issues, but I’d like to get my data off of an external disk formatted in HFS+. And for ‘reasons’ I wanted to use something “native” but I don’t feel like building a Hackintosh. While not a strict tutorial on getting Snow Leopard running, I did upload my old download of Empire EFI on archive.org as this kind of stuff is damned near impossible to find.

So back to the matter at hand, I have this VM setup on my Mac Pro, and I want it on this Windows machine. You’d think it would be something like this:

scp -C [email protected]:"/Users/neozeed/Virtual Machines.localized/OSX 10.6/*" .
Password:
scp: /Users/neozeed/Virtual: No such file or directory
scp: Machines.localized/OSX: No such file or directory
scp: 10.6/*: No such file or directory

Okay so double quotes didn’t work. How about a Unix style escape for spaces? I mean it *is* scp after all, maybe it doesn’t know it’s on Windows.

C:\osx>scp -C [email protected]:"/Users/neozeed/Virtual\ Machines.localized/OSX\ 10.6/*" .
Password:
scp: /Users/neozeed/Virtual: No such file or directory
scp: Machines.localized/OSX: No such file or directory
scp: 10.6/*: No such file or directory

Well maybe it parses it like C, so you need double backslash? NO that doesn’t work either. Talk about frustrating. So, in an act of insanity, I tried single quoting the interior spaces around double quotes, something idiotic like a bash variable:

C:\osx>scp -C [email protected]:"/Users/neozeed/Virtual' 'Machines.localized/OSX' '10.6/*" .
Password:
Mac OS Snow Leopard.vmdk                                                               69%   11GB  16.0MB/s   05:16 ETA

And yes, now it’s transferring. I’m just using a cheap 50zt 5 port 100Mbit dumb switch. It’s good enough and it’ll probably take some 30 minutes to transfer all the bits, but it’s working.

So there you go. You may not need it now, or tomorrow but it’ll save you the 20 minutes of frustration!

So there has been a problem

Posted on October 4, 2022 by neozeed

Iâ€™ve been on the road for a few months, basically in exile. Iâ€™ve downsized my once large online presence to a desktop in an office Iâ€™m still obligated to for rent and internet. So my i7 workstation had been up to the task, although it seems I didnâ€™t have the right power settings to turn back on from a power failure.

So now Iâ€™m faced with a stale backup as i canâ€™t find my currents. so yeah content has been lost. So i also donâ€™t want to race back to a single point of failure, so once more again Iâ€™m going to try a cloud. Microsoft keeps sending me these $200 trial things so i thought I would try it. There is a large learning curve about their networks , and deployments. although there is a bunch of â€shake and bakeâ€ deployments to speed things along. As always the key is reading the documents.

I still have a lot of stuff to upload, so for the moment the database is restored but there is a lot of messing around needed for the the old layout.

So appologies for the mess.

Special thanks for Tenox for helping me with all kinds of issues, and support from my muse.

Its always darker before its bright, and its already getting dark in northern Europe.

Virtually Fun

Fun with Virtualization

Monthly Archives: October 2022

Extracting warc files

Windows scp to remote machines with spaces in the directories

So there has been a problem