Teaching an Almost 40-year Old UNIX about Backspace

(This is a guest post by xorhash.)

Introduction

I have been messing with the UNIX®† operating system, Seventh Edition (commonly known as UNIX V7 or just V7) for a while now. V7 dates from 1979, so it’s about 40 years old at this point. The last post was on V7/x86, but since I’ve run into various issues with it, I moved on to a proper installation of V7 on SIMH. The Internet has some really good resources on installing V7 in SIMH. Thus, I set out on my own journey on installing and using V7 a while ago, but that was remarkably uneventful.

One convenience that I have been dearly missing since the switch from V7/x86 is a functioning backspace key. There seem to be multiple different definitions of backspace:

  1. BS, as in ASCII character 8 (010, 0x08, also represented as ^H), and
  2. DEL, as in ASCII character 127 (0177, 0x7F, also represented as ^?).

V7 does not accept either for input by default. Instead, # is used as the erase character and @ is used as the kill character. These defaults have been there since UNIX V1. In fact, they have been “there” since Multics, where they got chosen seemingly arbitrarily. The erase character erases the character before it. The kill character kills (deletes) the whole line. For example, “ba##gooo#d” would be interpreted as “good” and “bad line@good line” would be interpreted as “good line”.

There is some debate on whether BS or DEL is the correct character for terminals to send when the user presses the backspace key. However, most programs have settled on DEL today. tmux forces DEL, even if the terminal emulator sends BS, so simply changing my terminal to send BS was not an option. The change from the defaults outlined here to today’s modern-day defaults occurred between 4.1BSD and 4.2BSD. enf on Hacker News has written a nice overview of the various conventions.

Changing the Defaults

These defaults can be overridden, however. Any character can be set as erase or kill character using stty(1). It accepts the caret notation, so that ^U stands for ctrl-u. Today’s defaults (and my goal) are:

Function Character
erase DEL (^?)
kill ^U

I wanted to change the defaults. Fortunately, stty(1) allows changing them. The caret notation represents ctrl as ^. The effect of holding ctrl and typing a character is bitwise-ANDing it with 037 (0x1F) as implemented in /usr/src/cmd/stty.c and mentioned in the stty(1) man page, so the notation as understood by stty(1) for ^? is broken for DEL: ASCII ? bitwise-AND 037 is US (unit separator), so ^? requires special handling. stty(1) in V7 does not know about this special case. Because of this, a separate program – or a change to stty(1) – is required to call stty(2) and change the erase character to DEL. Changing the kill character was easy enough, however:

$ stty kill '^U'

So I wrote that program and found out that DEL still didn’t work as expected, though ^U did. # stopped working as erase character, so something certainly did change. This is because in V7, DEL is the interrupt character. Today, the interrupt character is ^C.

Clearly, I needed to change the interrupt character. But how? Given that stty(1) nor the underlying syscall stty(2) seemed to let me change it, I looked at the code for tty(4) in /usr/sys/dev/tty.c and /usr/sys/h/tty.h. And in the header file, lo and behold:

#define	CERASE	'#'		/* default special characters */
#define	CEOT	004
#define	CKILL	'@'
#define	CQUIT	034		/* FS, cntl shift L */
#define	CINTR	0177		/* DEL */
#define	CSTOP	023		/* Stop output: ctl-s */
#define	CSTART	021		/* Start output: ctl-q */
#define	CBRK	0377

I assumed just changing these defaults would fix the defaults system-wide, which I found preferable to a solution in .profile anyway. Changed the header, one cycle of make allmake unixcp unix /unix and a reboot later, the system exhibited the same behavior. No change to the default erase, kill or interrupt characters. I double-checked /usr/sys/dev/tty.c, and it indeed copied the characters from the header. Something, somewhere must be overwriting my new defaults.

Studying the man pages in vol. 1 of the manual, I found that on multi-user boot init calls /etc/rc, then calls getty(8), which then calls login(1), which ultimately spawns the shell. /etc/rc didn’t do anything interesting related to the console or ttys, so the culprit must be either getty(8) or login(1). As it turns out, both of them are the culprits!

getty(8) changes the erase character to #, the kill character to @ and the struct tchars to { '\177', '\034', '\021', '\023', '\004', '\377' }. At this point, I realized that:

  1. there’s a struct tchars,
  2. it can be changed from userland.

The first member of struct tchars is char t_intrc, the interrupt character. So I could’ve had a much easier solution by writing some code to change the struct tchars, if only I’d actually read the manual. I’m too far in to just settle with a .profile solution and a custom executable, though. Besides, I still couldn’t actually fix my typos at the login prompt unless I make a broader change. I’d have noticed the first point if only I’d actually read the man page for tty(4). Oops.

login(1) changes the erase character to # and the kill character to @. At least the shell leaves them alone. Seriously, three places to set these defaults is crazy.

Fixing the Characters

The plan was simple, namely, perform the following substitution:

Function Old Character Old Character
ASCII (Octal)
New Character New Character
ASCII (Octal)
erase # 043 DEL 0177
kill @ 0100 ^U 025
interrupt DEL 0177 ^C 003

So, I changed the characters in tty(4), getty(8) and login(1). It worked! Almost. Now DEL did indeed erase. However, there was no feedback for it. When I typed DEL, the cursor would stay where it is.

Pondering the code for tty(4) again, I found that there is a variable called partab, which determines delays and what kind of special handling to apply if any. In particular, BS has an entry whose handler looks like this:

	/* backspace */
	case 2:
		if (*colp)
			(*colp)--;
		break;

Naïve as I was, I just changed the entry for DEL from “non-printing” to “backspace”, hoping that would help. Another recompilation cycle and reboot later, nothing changed. DEL still only silently erased. So I changed the handler for another character, recompiled, rebooted. Nothing changed. Again. At that point, I noticed something else must have been up.

I found out that the tty is in so-called echo mode. That means that all characters typed get echoed back to the tty. It just so happens that the representation of DEL is actually none at all. Thus it only looked like nothing changed, while the character was actually properly echoed back. However, when temporarily changing the erase character to BS (^H) and typing ^H manually, I would get the erase effect and the cursor moved back by one character on screen. When changing the erase character to something else like # and typing ^H manually, I would get no erasure, but the cursor moved back by one character on screen anyway. I now properly got the separation of character effect and representation on screen. Because of this unprintable-ness of DEL, I needed to add a special case for it in ttyoutput():

	if (c==0177) {
		ttyoutput(010, tp);
		ttyoutput(' ', tp);
		ttyoutput(010, tp);
		return;
	}

What this does is first send a BS to move the cursor back by one, then send a space to rub out the previous character on screen and then send another BS to get to the previous cursor position. Fortunately, my terminal lives in a world where doing this is instantaneous.

Getting the Diff

For future generations as well as myself when I inevitably majorly break this installation of V7, I wanted to make a diff. However, my V7 is installed in SIMH. I am not a very intelligent man, I didn’t keep backup copies of the files I’d changed. Getting data out of this emulated machine is an exercise in frustration.

Transmission over ethernet is out by virtue of there being no ethernet in V7. I could simulate a tape drive and write a tar file to it, but neither did I find any tools to convert from simulated tape drive to raw data, nor did I feel like writing my own. I could simulate a line printer, but neither did V7 ship with the LP11 driver (apparently by mistake), nor did I feel like copy/pasting a long lpr program in – a simple cat(1) to /dev/lp would just generate fairly garbled output. I could simulate another hard drive, but even if I format it, nothing could read the ancient file system anyway, except maybe mount_v7fs(8) on NetBSD. Though setting up NetBSD for the sole purpose of mounting another virtual machine’s hard drive sounds silly enough that I might do it in the future.

While V7 does ship with uucp(1), it requires a device to communicate through. It seems that communication over a tty is possible V7-side, but in my case, quite difficult. I use the version of SIMH as packaged on Debian because I’m a lazy person. For some reason, the DZ11 terminal emulator was removed from that package. The DUP11 bit synchronous interface, which I hope is the same as the DU-11 mentioned /usr/sys/du.c, was not part of SIMH at the time of packaging. V7 only speaks the g protocol (see Ptbl in /usr/src/cmd/uucp/cntrl.c), which requires the connection to be 8-bit clean. Even if the simulator for a DZ11 were packaged, it would most likely be unsuitable because telnet isn’t 8-bit clean by default and I’m not sure if the DZ11 driver can negotiate 8-bit clean Telnet. That aside, I’m not sure if Taylor UUCP for Linux would be able to handle “impure” TCP communications over the simulated interface, rather than a direct connection to another instance of Taylor UUCP. Then there is the issue of general compatibility between the two systems. As reader DOS pointed out, there seem to be quite some difficulties. Those difficulties were experienced on V7/x86, however. I’m not ruling out that the issues encountered are specific to V7/x86. In any case, UUCP is an adventure for another time. I expect it’ll be such a mess that it’ll deserve its own post.

In the end, I printed everything on screen using cat(1) and copied that out. Then I performed a manual diff against the original source code tree because tabs got converted to spaces in the process. Then I applied the changes to clean copies that did have the tabs. And finally, I actually invoked diff(1).

Closing Thoughts

Figuring all this out took me a few days. Penetrating how the system is put together was surprisingly fairly hard at first, but then the difficulty curve eased up. It was an interesting exercise in some kind of “reverse engineering” and I definitely learned something about tty handling. I was, however, not pleased with using ed(1), even if I do know the basics. vi(1) is a blessing that I did not appreciate enough until recently. Had I also been unable to access recursive grep(1) on my host and scroll through the code, I would’ve probably given up. Writing UNIX under those kinds of editing conditions is an amazing feat. I have nothing but the greatest respect for software developers of those days.

Here’s the diff, but V7 predates patch(1), so you’ll be stuck manually applying it: backspace.diff

† UNIX is a trademark of The Open Group.

20 thoughts on “Teaching an Almost 40-year Old UNIX about Backspace

  1. You may not like ed, but you can use it to apply a patch! Use the -e flag to diff to make it generate a script that you can pass to ed. Unfortunately, these scripts can’t modify multiple files, so you’d need one such ed script per patched file.

    From the diff documentation:

    If the file `d’ contains the output of `diff -e old new’, then the
    command `(cat d && echo w) | ed – old’ edits `old’ to make it a copy of
    `new’.

      • Of course that’s easy to do on a modern system by just applying the patch and then generating new patches, but I was disappointed to find that I couldn’t find a tool that would convert unified diffs to ed scripts directly, as I had seen that Emacs at least could convert between unified and context diffs. I suppose that unified diffs and ed scripts are too many generations apart for anyone to have wanted to do this before!

      • I wrote a shell script that uses ed(1) to patch the source code:

        cd /usr

        ed ./src/cmd/getty.c <<'EOE'
        12c
        struct tchars tchars = { '\003', '\034', '\021', '\023', '\004', '\377' };
        .
        8,9c
        #define ERASE '\177'
        #define KILL '\025'
        .
        w
        q
        EOE

        ed ./src/cmd/login.c <<'EOE'
        46,47c
        ttyb.sg_erase = 0177;
        ttyb.sg_kill = 025;
        .
        w
        q
        EOE

        ed ./sys/dev/tty.c <<'EOE'
        559a
        * DEL is just a very fancy backspace today.
        * However, just reducing colp by one is insufficient.
        * We need to send an actual ^H.
        * We're taking this opportunity to also rub out the
        * previous character.
        */
        if (c==0177) {
        ttyoutput(010, tp);
        ttyoutput(' ', tp);
        ttyoutput(010, tp);
        return;
        }
        /*
        .
        w
        q
        EOE

        ed ./sys/h/tty.h <<'EOE'
        84c
        #define CINTR 003 /* ctl-c */
        .
        82c
        #define CKILL 025
        .
        80c
        #define CERASE 0177 /* default special characters */
        .
        w
        q
        EOE

  2. One method I’ve used for getting output out of SIMH is to SSH into the host and run SIMH, then turn on logging in your SSH program. Cat the file and turn off logging, then extract the file from the log. To transfer binaries, you’ll need to uuencode first.

    • Oh absolutely, uuencode/uudecode are insanely helpful programs to have. I’ve used them a few times, even on dialup (yuck!) to transform an ‘end user deployment’ into something far far more usable!

      Nothing like getting access to a system, taring up and out the header files, some libs, and thanks to some binutils+gcc magic returning with samba or even lynx…

  3. I figured you should be able to get files in and out using tapes, and had a look around and found some information about this. It’s obscure, but no harder than putting files on a FAT floppy disk and attaching it to QEMU! Of course, the uuencode/uudecode solution is something you can actually apply to modern problems too, but it looks like it’s slightly too modern – those tools didn’t actually come with v7 as far as I can tell 🙂

    http://a.papnet.eu/UNIX/v7/Installation already taught me that the command to attach a tape is ‘att tm0 .tap’ (or just ‘at’, or ‘attach’). There’s a corresponding ‘detach’/’det’.

    http://www.tuhs.org/Archive/Distributions/Research/Bug_Fixes/V6enb/v6enb.tar.gz contains some tools/patches for v6, but it also includes enblock.c and deblock.c which are useful for converting between raw data files and simh’s tape format. The tape format is pretty trivial, from the looks of things, so plenty of people have written tools for doing this conversion, this was just the first set I found. In the unlikely event that anyone is into this stuff but doesn’t know what to do here, run ‘gcc -o enblock enblock.c’ and ‘gcc -o deblock deblock.c’ to compile them both.

    To get files out of simh:

    1. Ctrl-E to enter simh’s command mode.
    2. ‘att tm0 mytape.tap’. The file will be created if it doesn’t currently exist. .tap seems to be the conventional filename for simh format tape files. You don’t seem to need to detach an existing tape first.
    3. ‘c’ to return to Unix.
    4. ‘tar -cf /dev/mt0 ‘ to write tar output to the tape. I think the ‘-‘ is optional, and instead of ‘f /dev/mt0’, I think you can just specify ‘0’, but I’m doing things the way I’m used to under Linux. I think I read something saying that even though the tar man page says you can specify a digit between 0 and 7 to pick a drive, tar only supports 0 and 1, but perhaps that was for a different version of Unix.
    5. Ctrl-E to return to simh command mode.
    6. ‘det tm0’ so you don’t accidentally use the tape again. I’m not sure how important this is, it seems like simh flushes the data to the tape immediately.
    7. In another shell, ‘deblock mytape.tar’.
    8. Extract the tar file as normal.

    To get files into simh:

    1. Create a .tar file as normal, let’s call it mytape.tar again.
    2. ‘enblock mytape.tap’.
    3. In simh, hit Ctrl-E to return to command mode.
    4. ‘att tm0 mytape.tap’. You could optionally add a ‘-e’ flag so that this errors out if you specify a filename that doesn’t exist, otherwise just be aware that if it says “TM: creating new file” you’ve done something wrong because we want to use the .tap file created by enblock.
    5. ‘c’ to return to Unix.
    6. ‘tar -xf /dev/mt0’ to extract the files from the tar file on the tape.

    In the last step, I see errors like “tar: / – cannot create”. https://groups.google.com/forum/#!topic/alt.sys.pdp11/yh_mvOZdMm8 mentions these and says you can tell GNU tar to create a v7-compatible .tar file. When I use that option, it doesn’t stop the errors from appearing. However, the errors seem to be harmless anyway – I tried copying the entire /usr/sys directory to a tape, extracting it under Linux, creating a new tar file on a new tape and extracting it to a new directory on the PDP-11 and ‘diff’ reported all the files were the same, so this seems to work.

    Now one complicating factor that I haven’t dealt with here is the fact that tapes can contain multiple files. They don’t have names, they’re just separated by “file marks” on the tape. The tape creation scripts accompanying v7 give an example of a tape with multiple files. The man page for dd has an example of how you can skip over a file when reading:

    Note the use of raw magtape. Dd is especially suited to I/O
    on the raw physical devices because it allows reading and
    writing in arbitrary record sizes.

    To skip over a file before copying from magnetic tape do

    (dd of=/dev/null; dd of=x) </dev/rmt0

    I think another option is to create a non-rewind version of the tape device and then read or write from it more than once. Under Linux I'm accustomed to then running 'mt rewind' when I want to rewind the tape, but I don't see an 'mt' command in v7; I suppose there is some trick like doing a tiny read from the rewinding version of the device. I suppose on the host side you'd then need better tools than enblock and deblock for dealing with multi-file tapes. I haven't tried any of this, since multiple files per tape seems unnecessarily complicated for virtual tapes given that you can just keep creating new tapes and the tapes actually have filenames on them!

    • Thank you for these details. I had the idea of using virtual tapes, but didn’t feel like writing my own handler for the SIMH tools.

      Now that I know that there’s a thing called “deblock”, I could find that relatively fast and presume it’s in http://www.tuhs.org/Archive/Distributions/Research/Bug_Fixes/V6enb/v6enb.tar.gz.

      As for skipping tape files, there seems to be a more intuitive solution. While not documented in dd(1), it’s mentioned in passing in Setting Up Unix – Seventh Edition in vol. 2B of the UNIX Programmer’s Manual and actually exists in /usr/src/cmd/dd.c:

      /etc/mkfs /dev/rp3 74000 (153406 if on RP04/5, 322278 on RP06)
      (The above command takes about 2-3 minutes on an RP03)
      dd if=/dev/nrmt0 of=/dev/null bs=20b files=6
      (skip 6 files on the tape)

      • > Now that I know that there’s a thing called “deblock”, I could find that relatively fast and presume it’s in http://www.tuhs.org/Archive/Distributions/Research/Bug_Fixes/V6enb/v6enb.tar.gz.

        Yes, that’s what I linked to, but I hid it in an enormous wall of text 🙂

        I found that when I tried to use /dev/rmt0 (“r” meaning “raw”) with tar, it didn’t work properly – I was trying to get a list of files from a tape written by the host and it just listed the same entry over and over, so I think tar won’t work with the raw devices (although I didn’t try very hard). I thought perhaps that meant that the trick is to create a non-raw, non-rewind device, which I think is possible.

        However, on further thought I suppose that since the rewind occurs on close, you could use the ‘dd’ command you posted to skip over files, then use /dev/mt0 with tar to read or write, and it will access the Nth file but then rewind to the start of the tape after it’s done.

      • Thanks! I found that project, but those first two tools are missing from the README.md so I didn’t notice them. I see that they support multiple files per tape as you’d hope.

  4. Since booting (and logging in) takes a bit of typing, and I noticed that the SIMH v4.0 beta has some ‘expect’-like commands, I figured out how to automate it. In the boot.ini file that http://a.papnet.eu/UNIX/v7/Installation instructs you to create, insert these lines before “b rp0”:

    >>>
    # Set up expect rules for booting.
    # Note that LF comes before CR until the password prompt.
    expect “Boot\n\r: ” send “hp(0,0)unix\r”; c
    expect “\n\r# ” send “\004”; c
    expect “\n\rlogin: ” send “root\r”; c
    expect “\r\nPassword:” send “root\r”; c

    # Fill the input buffer with the first command.
    send “boot\r”

    # Now boot, which will trigger the above command sequence.
    <<<

    ("b rp0" being the command to boot).

    If you haven't finished following the instructions on that page yet, change "unix" to "hptmunix", "myunix", or whatever kernel you want to boot.

    I couldn't find any examples of using these commands anywhere, so hopefully this helps someone. I think that up until now everyone who has wanted to automate any of these things has actually used 'expect' to interact with SIMH.

    This will save me from having to remember or record any of these steps which I only learned in the last few hours and which I probably won't perform again for a long time unless xorhash provides me with some more inspiration 🙂

    PS I like that the man page for the find command says this:

    BUGS
    The syntax is painful.

    but it's pretty much the same syntax we use today 🙂 The only challenge I had was in figuring out that it doesn't print by default and you have to specify '-print'.

  5. A simple trick I’ve read (I haven’t tried it, but it sounds like it should work), is to use tar to write to a disk device. Outside the SIMH instance it is of course a file, and you can use tar on that file too. It just needs to understand the old V7 tar format.
    Or otherwise you can do it one plain file at a time (although that might get tricky with detecting the end of it).

  6. I made the requisite patches to the system source code and rebuilt the kernel, but it panics when I attempt to boot it (my old kernel still works, however). Is it possible I missed a step, or did the kernel compilation wrong?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.