fread and fwrite demystified: stdio on UNIX V7

(This is a guest post by xorhash.)

1. Introduction

Did I say I’m done with UNIX Seventh Edition (V7)? How silly of me; of course I’m not. V7 is easy to study, after all.

Something that’s always bothered me about the stdio.h primitives fread() and fwrite() are their weak guarantees about what they actually do. Is a short read or write “normal” in the sense that I should normally expect it? While this makes no answer about modern-day operating systems, a look at V7 may enlighten me about what the historical precedent is.

As an aside: It’s worth noting that the stdio.h functions are some of the few that require a header. It was common historical practice not to declare functions in headers, just see crypt(3) as an example.

I will first display the man page, then ask the questions I want to answer, then look at the implementation and finally use that gained knowledge to answer the questions.

2. Into the Man Page

The man page for fread() and fwrite() is rather terse. Modern-day man pages for those functions are equally terse, though, so this is not exactly a novelty of age. Here’s what it reads:

NAME

fread, fwrite – buffered binary input/output

SYNOPSIS

#include <stdio.h>

fread(ptr, sizeof(*ptr), nitems, stream)
FILE
*stream;

fwrite(ptr, sizeof(*ptr), nitems, stream)
FILE
*stream;

DESCRIPTION

Fread reads, into a block beginning at ptr, nitems of data of the type of *ptr from the named input stream. It returns the number of items actually read.

Fwrite appends at most nitems of data of the type of *ptr beginning at ptr to the named output stream. It returns the number of items actually written.

SEE ALSO

read(2), write(2), fopen(3), getc(3), putc(3), gets(3), puts(3), printf(3), scanf(3)

DIAGNOSTICS

Fread and fwrite return 0 upon end of file or error.

So there are the following edge cases that are interesting:

  • In fread(): If sizeof(*ptr) is greater than the entire file, what happens?
  • If sizeof(*ptr) * nitems overflows, what happens?
  • Is the “number of items actually read/written” guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
  • Is the “number of items actually written” guaranteed to have written every item in its entirety?
  • What qualifies as error?

3. A Look at fread()

Note: All file paths for source code are relative to /usr/src/libc/stdio/ unless noted otherwise. You can read along at the TUHS website.

rdwr.c implements fread(). fread() is simple enough; it’s just a nested loop. The outer loop runs nitems times. The outer loop sets the number of bytes to read (sizeof(*ptr)) and runs the inner loop. The inner loop calls getc() on the input FILE *stream and writes each byte to *ptr until either getc() returns a value less < 0 or all bytes have been read.

/usr/include/stdio.h implements getc(FILE *p) as a C preprocessor macro. If there is still data in the buffer, it returns the next character and advances the buffer by one. Interestingly, *(p)-&gt;_ptr++&amp;0377 is used to return the character, despite _ptr being a char *. I’m not sure why that &amp;0377 (&amp;0xFF is there. If there is no data in the buffer, it instead returns _filbuf(p).

filbuf.c implements _filbuf(). This function is a lot more complex than the other ones until now. It begins with a check for the _IORW flag and, if set, sets the _IOREAD flag as well. It then checks if _IOREAD is not set or if _IOSTRG is set and returns EOF (defined as -1 in stdio.h) if so. These all seem rather inconsequential to me. I can’t make heads or tails of _IOSTRG, however, but it seems irrelevant; _IOSTRG is only ever set internally in sprintf and sscanf for temporary internal FILE objects. After those two flag checks, _filbuf() allocates a buffer into iop-&lt;_base, which seems to be the base pointer of the buffer. If flag _IONBF is set, which happens when setbuf() is used to switch to unbuffered I/O, a temporary, static buffer is used instead. Then read() is called, requesting either 1 bytes if unbuffered I/O is requested or BUFSIZ bytes. If read() returned 0, the FILE is flagged as end-of-file and EOF is returned by _filbuf(). If read() returned <0, the FILE is flagged as error and EOF is returned by _filbuf(). Otherwise, the first character that has been read is returned by _filbuf() and the buffer pointer incremented by one.

According to its man page, read() only returns 0 on end-of-file. It can also return -1 on “many conditions”, namely “physical I/O errors, bad buffer address, preposterous nbytes, file descriptor not that of an input file”

As an aside, BUFSIZ still exists today. ISO C11 § 7.21.2 no. 9 dictates that BUFSIZ must be at least 256. V7 defines it as 512 in stdio.h. One is inclined to note that on V7, a filesystem block was understood 512 bytes length, so this was presumably chosen for efficient I/O buffering.

4. A Look at fwrite()

rdwr.c also implements fwrite(). fwrite() is effectively the same as fread(), except the inner loop uses putc(). After every inner loop, a call to ferror() is made. If there was indeed an error, the outer loop is stopped.

/usr/include/stdio.h implements putc(int x, FILE *p) as a C preprocessor macro. If there is still room in the buffer, the write happens into the buffer. Otherwise, _flsbuf() is called.

flsbuf.c implements _flsbuf(int c, FILE *iop). This function, too, is more complex than the ones until now, but becomes more obvious after reading _filbuf(). It starts with a check if _IORW is set and if so, it’ll set _IOWRT and clear the EOF flag. Then it branches into two major branches: the _IONBF branch without buffering, which is a straight call to write(), and the other branch, which allocates a buffer if none exists already or otherwise calls write() if the buffer is full. If write() returned less than expected, the error flag is set and EOF returned. Otherwise, it returns the character that was written.

According to its man page, write() returns the number of characters (bytes) actually written; a non-zero value “should be regarded as an error”. With only a cursory glance over the code, this appears to happen for similar reasons as read(), which is either physical I/O error or bad parameters.

5. Conclusions

In fread(): If sizeof(*ptr) is greater than the entire file, what happens?
On this under-read, fread() will end up reading the entire file into the memory at ptr and still return 0. The I/O happens byte-wise via getc(), filling up the buffer until getc() returns EOF. However, it will not return EOF until a read() returns 0 on EOF or -1 on error. This result may be meaningful to the caller.

If sizeof(*ptr) * nitems overflows, what happens?
No overflow can happen because there is no multiplication. Instead, two loops are used, which avoids the overflow issue entirely. (If there are strict filesystem constraints, however, it may be de-facto impossible to read enough bytes that sizeof(*ptr) * nitems overflows. And of course, there’s no way you could have enough RAM on a PDP-11 for the result to actually fit into memory.)

Is the “number of items actually read/written” guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
Partially: Both fread() and fwrite() short-circuit on error. This causes the number of items that have actually been read or written successfully to be returned. The only relevant error condition is filesystem I/O error. Due to the byte-wise I/O, it’s possible that there was a partial read or write for the last element, however. Therefore, it would be more accurate to say that the “number of items actually read/written” is guaranteed to be the number of non-partial items that can be read/written. A short read or short write is an abnormal condition.

Is the “number of items actually written” guaranteed to have written every item in its entirety?
No, it isn’t. A partial write is possible. If a series of structs is written and then to be read out again, however, this is not a problem: fread() and fwrite() only return the count of full items read or written. Therefore, the partial write will not cause a partial read issue. If a set of bytes is written, this is an issue: There will be incomplete data – possibly to be parsed by the program. It is therefore to preferable to write (and especially read) arrays of structs than to write and read arrays of bytes. (From a modern-day perspective, this is horrendous design because this means data files are not portable across platforms.)

What qualifies as error?
Effectively, only a physical I/O error or a kernel bug. Short fread() or fwrite() return values are abnormal conditions. I’m not sure if there is the possibility that the process got a signal and the current read() or write() ends up writing nothing before the EINTR; this seems to be more of a modern-day problem than something V7 concerned itself.

Date and Time on UNIX V7

(This is a guest post by xorhash.)

Introduction

I’ve recently defeated one of my bigger inconveniences, broken DEL as backspace on the UNIX®† operating system, Seventh Edition (commonly known as UNIX V7 or just V7). However, I’ve had another pet peeve for a while: How much manual labor is involved in booting the system. Reader DOS found out that SIMH recently added support for SEND/EXPECT pairs to react to output from the simulator. Think of them like UUCP chat scripts, effectively. This can be used to automate the bootup procedure.

Yet DOS’s script skips over a part of the bootup procedure that can be fully automated with some additional tooling. Namely, setting the date/time to the current time as defined by the host system. As per boot(8), the operator is meant to set the date/time every time the system is brought up. This should be possible to automate, right?

Setting the Date Automatically

date(1) itself does not support setting years past 2000, so we need custom code in any case. SIMH, fortunately, also provides a way to get the current timestamp in the form of %UTIME%, which is interpreted in any argument to any command. I’ve thus written a utility called tsdate that takes a timestamp as argument and sets the current time to be that timestamp. I put the executable in /etc/tsdate, but there’s really no reason to do so other than not wanting to accidentally call it. Once tsdate is in place, changing DOS’s script slightly will already do the trick:

expect "\n\r# " send "/etc/tsdate %UTIME%\r\004"; c

This approach already has a minor amount of time drift ab initio, namely the difference between the actual time on the host system and the UNIX timestamp. In the worst case, this may be very close to 1. If for some reason you need higher accuracy than this, you’ll probably have a fairly hard time. I could imagine some kind of NTP-over-serial, but you’d need support for chat scripts to get past the authentication due to getty(8) spawning login(1).

The system is not suitable for usage past the year 2038. However, you can at least push it back until around 2100 by changing the internal representation of time to be an unsigned long instead of simply a long. time_t was not used systematically. Instead, everything assumed long as the type for times. This assumption is in a lot of places in userspace and even the man pages use long instead of time_t.

If you overflow tsdate’s timestamp, you’ll just get whatever happens when atol(3) overflows. There’s nothing in the standard library for parsing strings into unsigned long and the year 2038 is far enough away that I didn’t want to bother. stime(2) would presumably also need to be adjusted.

Year 2000 Compatibility in the System

V7 is surprisingly good at handling years past 2000. Most utilities can print years up to and including 2099 properly. Macros for nroff(1)/troff(1), however, are blissfully unaware that years past 1999 may exist. This causes man pages to be supposedly printed in the year 19118. The root cause for this is that the number register yr only holds the current year minus 1900. Patches to the -ms and -man macros are required. Similarly, refer(1) only considers years until 2000 to be actual years, though I did not bother patching that since it should only affect keyword matching.

Leap year handling is broken in two different places due to wrong leap year handling: at(1) and the undocumented dysize() function, used by date(1) as well as inside /usr/src/libc/gen/ctime.c for various purposes. This affects the standard library, so recompiling the entire userland is recommended. Because the calculation is just a naïve division by four, it actually works on the year 2000 itself. A year is a leap year, i.e. has February 29, if a year is:

  1. a multiple of 4, and
    1. not a multiple of 100, or
    2. a multiple of 400.

Leap seconds are also not accounted for, but that comes to nobody’s surprise. Leap seconds just add a second 60 to the usual 00-59, so they don’t hurt doing date calculations on timestamps unless precisely on a leap second. For my purposes, they can be ignored.

Note that I haven’t gone through the system with a fine-toothed comb. There can always be more subtle time/date issues remaining in the system. For my purposes, this works well enough. If other things crop up, you’re welcome to put them in the comments for future generations.

The Good Stuff

There’s really not much to it. Applying diff and recompilation of the affected parts is left as an exercise for the reader.

Files:

  1. tsdate.c
  2. tsdate.1m
  3. y2kpatches.diff

† UNIX is a trademark of The Open Group.

Teaching an Almost 40-year Old UNIX about Backspace

(This is a guest post by xorhash.)

Introduction

I have been messing with the UNIX®† operating system, Seventh Edition (commonly known as UNIX V7 or just V7) for a while now. V7 dates from 1979, so it’s about 40 years old at this point. The last post was on V7/x86, but since I’ve run into various issues with it, I moved on to a proper installation of V7 on SIMH. The Internet has some really good resources on installing V7 in SIMH. Thus, I set out on my own journey on installing and using V7 a while ago, but that was remarkably uneventful.

One convenience that I have been dearly missing since the switch from V7/x86 is a functioning backspace key. There seem to be multiple different definitions of backspace:

  1. BS, as in ASCII character 8 (010, 0x08, also represented as ^H), and
  2. DEL, as in ASCII character 127 (0177, 0x7F, also represented as ^?).

V7 does not accept either for input by default. Instead, # is used as the erase character and @ is used as the kill character. These defaults have been there since UNIX V1. In fact, they have been “there” since Multics, where they got chosen seemingly arbitrarily. The erase character erases the character before it. The kill character kills (deletes) the whole line. For example, “ba##gooo#d” would be interpreted as “good” and “bad [email protected] line” would be interpreted as “good line”.

There is some debate on whether BS or DEL is the correct character for terminals to send when the user presses the backspace key. However, most programs have settled on DEL today. tmux forces DEL, even if the terminal emulator sends BS, so simply changing my terminal to send BS was not an option. The change from the defaults outlined here to today’s modern-day defaults occurred between 4.1BSD and 4.2BSD. enf on Hacker News has written a nice overview of the various conventions.

Changing the Defaults

These defaults can be overridden, however. Any character can be set as erase or kill character using stty(1). It accepts the caret notation, so that ^U stands for ctrl-u. Today’s defaults (and my goal) are:

Function Character
erase DEL (^?)
kill ^U

I wanted to change the defaults. Fortunately, stty(1) allows changing them. The caret notation represents ctrl as ^. The effect of holding ctrl and typing a character is bitwise-ANDing it with 037 (0x1F) as implemented in /usr/src/cmd/stty.c and mentioned in the stty(1) man page, so the notation as understood by stty(1) for ^? is broken for DEL: ASCII ? bitwise-AND 037 is US (unit separator), so ^? requires special handling. stty(1) in V7 does not know about this special case. Because of this, a separate program – or a change to stty(1) – is required to call stty(2) and change the erase character to DEL. Changing the kill character was easy enough, however:
$ stty kill '^U'

So I wrote that program and found out that DEL still didn’t work as expected, though ^U did. # stopped working as erase character, so something certainly did change. This is because in V7, DEL is the interrupt character. Today, the interrupt character is ^C.

Clearly, I needed to change the interrupt character. But how? Given that stty(1) nor the underlying syscall stty(2) seemed to let me change it, I looked at the code for tty(4) in /usr/sys/dev/tty.c and /usr/sys/h/tty.h. And in the header file, lo and behold:

#define  CERASE  '#'    /* default special characters */
#define  CEOT  004
#define  CKILL  '@'
#define  CQUIT  034    /* FS, cntl shift L */
#define  CINTR  0177    /* DEL */
#define  CSTOP  023    /* Stop output: ctl-s */
#define  CSTART  021    /* Start output: ctl-q */
#define  CBRK  0377

I assumed just changing these defaults would fix the defaults system-wide, which I found preferable to a solution in .profile anyway. Changed the header, one cycle of make all, make unix, cp unix /unix and a reboot later, the system exhibited the same behavior. No change to the default erase, kill or interrupt characters. I double-checked /usr/sys/dev/tty.c, and it indeed copied the characters from the header. Something, somewhere must be overwriting my new defaults.

Studying the man pages in vol. 1 of the manual, I found that on multi-user boot init calls /etc/rc, then calls getty(8), which then calls login(1), which ultimately spawns the shell. /etc/rc didn’t do anything interesting related to the console or ttys, so the culprit must be either getty(8) or login(1). As it turns out, both of them are the culprits!

getty(8) changes the erase character to #, the kill character to @ and the struct tchars to { '\177', '\034', '\021', '\023', '\004', '\377' }. At this point, I realized that:

  1. there’s a struct tchars,
  2. it can be changed from userland.

The first member of struct tchars is char t_intrc, the interrupt character. So I could’ve had a much easier solution by writing some code to change the struct tchars, if only I’d actually read the manual. I’m too far in to just settle with a .profile solution and a custom executable, though. Besides, I still couldn’t actually fix my typos at the login prompt unless I make a broader change. I’d have noticed the first point if only I’d actually read the man page for tty(4). Oops.

login(1) changes the erase character to # and the kill character to @. At least the shell leaves them alone. Seriously, three places to set these defaults is crazy.

Fixing the Characters

The plan was simple, namely, perform the following substitution:

Function Old Character Old Character
ASCII (Octal)
New Character New Character
ASCII (Octal)
erase # 043 DEL 0177
kill @ 0100 ^U 025
interrupt DEL 0177 ^C 003

So I changed the characters in tty(4), getty(8) and login(1). It worked! Almost. Now DEL did indeed erase. However, there was no feedback for it. When I typed DEL, the cursor would stay where it is.

Pondering the code for tty(4) again, I found that there is a variable called partab, which determines delays and what kind of special handling to apply if any. In particular, BS has an entry whose handler looks like this:

  /* backspace */
  case 2:
    if (*colp)
      (*colp)--;
    break;

Naïve as I was, I just changed the entry for DEL from “non-printing” to “backspace”, hoping that would help. Another recompilation cycle and reboot later, nothing changed. DEL still only silently erased. So I changed the handler for another character, recompiled, rebooted. Nothing changed. Again. At that point, I noticed something else must have been up.

I found out that the tty is in so-called echo mode. That means that all characters typed get echoed back to the tty. It just so happens that the representation of DEL is actually none at all. Thus it only looked like nothing changed, while the character was actually properly echoed back. However, when temporarily changing the erase character to BS (^H) and typing ^H manually, I would get the erase effect and the cursor moved back by one character on screen. When changing the erase character to something else like # and typing ^H manually, I would get no erasure, but the cursor moved back by one character on screen anyway. I now properly got the separation of character effect and representation on screen. Because of this unprintable-ness of DEL, I needed to add a special case for it in ttyoutput():

  if (c==0177) {
    ttyoutput(010, tp);
    ttyoutput(' ', tp);
    ttyoutput(010, tp);
    return;
  }

What this does is first send a BS to move the cursor back by one, then send a space to rub out the previous character on screen and then send another BS to get to the previous cursor position. Fortunately, my terminal lives in a world where doing this is instantaneous.

Getting the Diff

For future generations as well as myself when I inevitably majorly break this installation of V7, I wanted to make a diff. However, my V7 is installed in SIMH. I am not a very intelligent man, I didn’t keep backup copies of the files I’d changed. Getting data out of this emulated machine is an exercise in frustration.

Transmission over ethernet is out by virtue of there being no ethernet in V7. I could simulate a tape drive and write a tar file to it, but neither did I find any tools to convert from simulated tape drive to raw data, nor did I feel like writing my own. I could simulate a line printer, but neither did V7 ship with the LP11 driver (apparently by mistake), nor did I feel like copy/pasting a long lpr program in – a simple cat(1) to /dev/lp would just generate fairly garbled output. I could simulate another hard drive, but even if I format it, nothing could read the ancient file system anyway, except maybe mount_v7fs(8) on NetBSD. Though setting up NetBSD for the sole purpose of mounting another virtual machine’s hard drive sounds silly enough that I might do it in the future.

While V7 does ship with uucp(1), it requires a device to communicate through. It seems that communication over a tty is possible V7-side, but in my case, quite difficult. I use the version of SIMH as packaged on Debian because I’m a lazy person. For some reason, the DZ11 terminal emulator was removed from that package. The DUP11 bit synchronous interface, which I hope is the same as the DU-11 mentioned /usr/sys/du.c, was not part of SIMH at the time of packaging. V7 only speaks the g protocol (see Ptbl in /usr/src/cmd/uucp/cntrl.c), which requires the connection to be 8-bit clean. Even if the simulator for a DZ11 were packaged, it would most likely be unsuitable because telnet isn’t 8-bit clean by default and I’m not sure if the DZ11 driver can negotiate 8-bit clean Telnet. That aside, I’m not sure if Taylor UUCP for Linux would be able to handle “impure” TCP communications over the simulated interface, rather than a direct connection to another instance of Taylor UUCP. Then there is the issue of general compatibility between the two systems. As reader DOS pointed out, there seem to be quite some difficulties. Those difficulties were experienced on V7/x86, however. I’m not ruling out that the issues encountered are specific to V7/x86. In any case, UUCP is an adventure for another time. I expect it’ll be such a mess that it’ll deserve its own post.

In the end, I printed everything on screen using cat(1) and copied that out. Then I performed a manual diff against the original source code tree because tabs got converted to spaces in the process. Then I applied the changes to clean copies that did have the tabs. And finally, I actually invoked diff(1).

Closing Thoughts

Figuring all this out took me a few days. Penetrating how the system is put together was surprisingly fairly hard at first, but then the difficulty curve eased up. It was an interesting exercise in some kind of “reverse engineering” and I definitely learned something about tty handling. I was, however, not pleased with using ed(1), even if I do know the basics. vi(1) is a blessing that I did not appreciate enough until recently. Had I also been unable to access recursive grep(1) on my host and scroll through the code, I would’ve probably given up. Writing UNIX under those kinds of editing conditions is an amazing feat. I have nothing but the greatest respect for software developers of those days.

Here’s the diff, but V7 predates patch(1), so you’ll be stuck manually applying it: backspace.diff

† UNIX is a trademark of The Open Group.

 

learn: Ancient troff sources vs. modern-day groff

(This is a guest post by xorhash.)

Introduction

I’ve been on a trip on the memory lane lately, digging around old manuals of UNIX® operating system before BSD.† In doing so, I’ve come across the sources for the 7th Edition manuals. I wanted to show one part of volume 2A to other people, but didn’t want to make them download the entire 336 pages of volume 2A for the part in question. The part I wanted to extract was “LEARN — Computer-Aided Instruction on UNIX”, starting at p. 107 in the volume 2A PDF file).

A normal person would, I presume, try to split the PDF file. That is straightforward and produces the expected results. I believe I needn’t state that you wouldn’t be reading this if I solved this problem like any sane person would. Instead, I opted to rebuild the PDF from the troff sources provided at the link above.

I am not a very clever man, and thus I completely disregarded the generation procedure that was already spelled out. However, it wasn’t exactly specific anyway, so I didn’t miss out on much.

Getting the sources

So I knew what I needed to do: Get the troff sources. I asked that the Heavens have mercy on my poor soul if this requires a lot of adjustment for 2017 text processing tools. However, a man must do what a man must do. The file in question was called “vol2/learn.bun”. I had no idea what a bun file is, hoped it wasn’t related to steamed buns and clicked it. As it turns out, it’s just what we would call a self-extracting archive today. The shell commands are not very weird, so the extraction process actually worked out just fine. Now I had files “p0” through “p7”. Except what happened to “p1”, the world will never know.

First Steps

I’ve dabbled in man pages before, but that was mostly mandoc, not actual troff.
Accordingly, the first attempt at getting something going was as naive as it could get:
$ groff -Tpdf p* | zathura -
It led to, shall we say, varying results.

really butchered rendering attempt

Clearly, I was doing something very fundamentally wrong. Conveniently, volume 2A also had a lot of troff documentation. Apparently I was supposed to pass -ms and first run tbl(1) over the troff source before actually giving it to groff. That sounded like a good idea, but the results were still somewhat off:

not very butchered rendering attempt

Allow me to express my doubts that this text was written in 2017. If you compare the output with the known-good PDF, you’ll also notice that, somehow, “Bell Laboratories, Murray Hill, New Jersey 07974” turned into “CAI”. Unfortunate.

Back to Square One and Pick Up the Breadcrumbs

Continuing to read the page I got the learn.bun from, I also spied a section called “Macros and References”. That sounds relevant to my interests. tmac.s, which after studying groff(1) seems to be what would get used with -ms references some files in /usr/lib/tmac. I was not in the mood to let this flood over into my system, so I had to make minor adjustments and turn it into relative paths. I also renamed tmac.s to tmac.os to avoid colliding with the one provided by groff, making the new invocation:

$ tbl p* | groff -M./macros -mos -Tpdf | zathura -

Now we’re getting somewhere:

almost not butchered rendering attempt

It’s better than the previous attempts. But there are also some warnings and problems that need cleaning up:

  1. There’s a note that Bell Laboratories holds the UNIX®
    trademark, which is no longer true.†
  2. Now, this most certainly was not written in December 21,
    19117, either.
  3. tmac.os:806: warning: numeric expression expected (got `\')
  4. Every time the .UX macro was requested, I got:
    warning: macro `ev1' not defined (possibly missing space after `ev')
    environment stack underflow

Point 1 was easy to address, it’s a simple text change. Point 2 was caused by spurious dots in front of a call to .ND. However, the actual volume 2A PDF said a different date than in the file, so I adjusted that to match (June 18, 1976 to January 30, 1979).

And Down the Slippery Slope

As for points 3 and 4… Let’s just say groff/troff macros are definitely not meant to be written or read by humans and it’s a feat comparable to magic that someone wrote this set of troff macros. Line 806 is .ch FO \\n(YYu. Supposedly, that changes the location of a page trap when the given macro is invoked. The second argument is meant to be a distance, which explains why groff is complaining. I tried to checked what groff does and left none the wiser. FO seems related to the page footer, I seemed to get away with just deleting that line, though.

Finally, point 4. Apparently, .ev1 was used multiple times in the tmac.os. This looked like it should’ve been .ev 1 instead. Changing those, lo and behold, .UX stopped behaving funky for the most part. Yet for some reason, I’d still get multiple footnotes about the trademark ownership of the UNIX® trademark.† tmac.os sets a troff register (GA) when the .UX macro is first encountered so that the footnote is only made once. The footnote is being made twice. Something does not add up here..AI (author’s institution) resets GA, but the first .UX comes after .AI, so that’s not the problem. Removing the .AB/.AE macros from page 1 caused only one footnote to be made. Thus, I infer it’s actually intended behavior that the footnote is made once for the abstract and once for the main body. Checking with the volume 2A PDF again, I realized that point 4 was, in fact, fixed just by the ev1 changes and I was just chasing a bug that does not exist. I really should’ve checked the PDF twice.

The abstract finally looks okay.

good rendering attempt

Done! Wait, No, Almost

Okay, we’re done, we can go home, right? Almost, one last thing to do: On the last page, there’s something really important missing: the bibliography. Instead, there’s just “$LIST$” there. We can’t just turn Brian W. Kernighan and Michael E. Lesk into plagiarists!

Back to the troff documentation in volume 2A, there’s a match for “$LIST$” on p. 183. Apparently I need a reference file and preprocess the file with refer(1). That sounds simple enough. Fortunately, I got the reference file along with the macros above, so I didn’t have to look for that separately.

$ refer -pRv7man -e p* | tbl | groff -M./macros -mos -Tpdf | zathura -

half of the references are blank

Of course. Why would it work? That’d have been too much to ask for.
At least I get some nice hints:

refer:p2:148: no matches for `skinner teaching 1961'
refer:p3:114: no matches for `kernighan editor tutorial 1974'

The troff documentation conveniently explains the format for the reference file, so I could just add these two entries to Rv7man and be done with it. Thankfully, the pre-compiled PDF of the volume 2A manual had the information necessary to compile the bibliography entries with.

%T Why We Need Teaching Machines
%A B. F. Skinner
%J Harvard Educational Review
%V 31
%P 377-398
%D 1961

%T A Tutorial Introduction to the Unix Editor ed
%A B. W. Kernighan
%D 1974

now that’s what I call a bibliography

And of course, here is the product of this whole ordeal.

Closing Remarks

The Heavens were feeling somewhat merciful, but only just enough that I could waste no more than a day on this project. They really wanted me to spend that day on it, though.

On a side note, “the missing learn references” aren’t available from the link that was
provided. http://cm.bell-labs.com/cm/cs/who/bwk/learn.tar.gz is now down, though the web archive still has it. Needless to say, I didn’t read that.

I will never, ever touch troff/groff again. mandoc is good at what it does and I’ll stick to mandoc for writing man pages. But if I ever need to get something typeset nicely from plain text?

LaTeX is the answer.
Not troff.
Never troff.
Not even once.

UNIX® is a registered trademark of The Open Group.

AT&T 3B2 400 emulated

This is super awesome!

AT&T 3B2 SYSTEM CONFIGURATION:

Memory size: 4 Megabytes
System Peripherals:

        Device Name        Subdevices           Extended Subdevices

        SBD
                        Floppy Disk
                        72 Megabyte Disk

        Welcome!
This machine has to be set up by you.  When you see the "login" message type
                                setup
followed by the RETURN key.  This will start a procedure that leads you through
those things that should be done the "first time" the machine is used.

The system is ready.

Console Login:

Back in the 1980’s AT&T shifted UNIX from being an internal research project that got somewhat popular in college spaces (and larger companies, General Motors was an early UNIX adapter, along with companies like Industrial Light and Magic).  Quickly after the divestiture of 1984, AT&T entered the commercial space with it’s own custom machines & their home made UNIX operating system.  Below is one of the ads they ran in 1984, touting their so called ‘super microcomputers’, featuring the 3B2, the 3B5, and the AT&T Personal Computer.

Thew new computers from AT&T

And indeed for many a government institute bewildered by the dozens of UNIX vendors, standards, and chaos of different platforms and processors many were all to happy to buy AT&T UNIX on AT&T machines.

And indeed this was my first experience with genuine SYSV Unix.

And I hated it.

Initially I had been thrown at an English computer lab because I knew how logon and do my work in style & diction, they decided I could help.  The system was aging and had major problems, as some prior students had figured out enough of the link kit that they would put their own attempts at re-writing portions of the kernel into the system, and it’d break.  Naturally the original installation diskettes were lost, and the best that could be done was basically shut it down throughout the day and run the disk repair utilities.  It was not a fun job.

Later on the 3B2’s were thrown into the ‘common garbage’ aka free kit for other departments, and the 3B2’s re-appeared at the next place I was volunteering at on campus.  However in addition to the two machines, there was a few other boxes of manuals, and oddly enough the installation diskettes.  And also about a dozen of these AT&T ISA Starlan adapters.  These weren’t the ones that were basically Ethernet (Starlan10) but rather the original ones.

Through some incredible luck we did find an NDIS 3 MS-DOS driver for the Starlan car, and we were able to cobble together a Starlan1 LAN consisting of a 3B2 that we cannibalized the RAM and disks from one of them to make a ‘super’ 3B2, with added TCP/IP software and a massively cut down port I did of samba to turn it into a tiny file & print server (72MB MFM disks were it’s biggest if I recall), along with Windows 95 clients.  And of course with a TCP/IP lan we could easily load a proxy server (WinGate?) on one machine with the 56kb modem, and now we all had internet access.  I know it’s sad today, but trust me back then it was “a big deal” that we had a fully functional LAN.

Over on loomcom.com there is an incredible amount of information about the reverse engineered WE32100, along with the 3B2 hardware, and of course information about the newest SIMH simulator the 3B2/400!

Instructions and disk images on the site made it incredibly easy to grab the latest SIMH Windows Development binaries, and get my own virtual 3B2 up and running in minutes! So naturally I pasted in dhrystone.c to see if it’d work.  And that was the first weird issue is that the backspace is the pound # key.  So all the C macro definitions lost their # sign.  I added them in vi without full terminal support because I’m crazy and:

# uname -a
unix unix 3.2 2 3B2
# ./dhrystone
Dhrystone(1.0) time for 500000 passes = 40
This machine benchmarks at 12500 dhrystones/second

Obviously this is 100% bogus, as the real machine should get around 735, and I didn’t even bother with the -O flag.

The current emulator doesn’t do any additional serial ports, nor any Ethernet adapters.  So you only get a console.  But with the pre-installed C compiler image, I was able to build a trivial file just fine.  Although pasting on the console really leaves a lot to be desired.

SDF AT&T 3B2/500 UNIX System

I know for some of us old people the 3B2 hid in the corners of our call centres, running our AT&T Definity switches, our voicemail, and even some of our early ISPs.  After funneling money into SUN to get them to work on SYSVr4 which was the grand unification of BSD + SYSV AT&T’s interest if UNIX quickly waned, and they divested themselves of UNIX, and eventually all PC hardware, although they did re-enter the PC space a few times before exiting yet again.

As time would tell, proprietary hardware + a previously ‘open’ operating system were not the winning combination.  And so far the only UNIX vendor to weather the Linux storm so far is IBM with it’s A/IX.

AMIX

AMIX Ad
AMIX Ad

Back in 1990 Commodore took the Amiga in a new direction with it’s new Amiga 3000, by commissioning a port of A&T SYSV Unix to the Amiga. Taking advantage of the 3000’s 68030 CPU and 68881 Math coprocessor, along with its integrated SCSI controller. It certainly was the hallmark of typical UNIX machines of the time.

When originally announced there was some big interest in the platform by SUN, as their original SUN-1, SUN-2 & SUN-3 lines of workstations were all 68000 based machines, and being able to rebrand a mass produced Commodore model would have been a good thing, however the deal ultimately fell through.  The machine would have been the Amiga 3500, which later became the Amiga 3000T.

Another thing to keep in mind is that SUN’s SYSV (Solaris) was targeted to the SPARC processor, and it is unlikely that they would benefit from selling a 68030 based machine in 1991.

Typical of the time, AMIX installs from a set of boot floppies, and then pulls the rest of the installation from a tape drive, such as the A3070.

AMIX was released at a time when the UNIX world was rapidly moving to RISC processors, SUN had their SPARC, SGI had their MIPS, IBM and their POWER, Motorola built UNIX machines around their 88000 RISC processor, NeXT was also going to move to the 88000 until they gave up making their own hardware and shifted to a software company.  So who would want a then dated 68030 based machine when the industry had made their first steps into the world of RISC computing.

So how does it measure up?  Well it is SYSV, and if you’ve seen one, well honestly you’ve seen them all.  What is kind of neat is that AMIX includes OpenLook and a C compiler, which is kind of a rarity for the period.

Another flaw was that when the 68040 processor was released it’s MMU was incompatible with the 68030, and the VM subsystem for any UNIX would have to be rewritten.  While NetBSD can run on both the 68030 and 68040, AMIX never was updated, and so it can only run on 68030 based machines.

AMIX never did get any critical traction, and slipped into oblivion with the death of Commodore.

Up until recently it was impossible to run AMIX in any emulator, but there has been a lot of work on the ARANYM and Pervious emulators which included doing 68030 MMU support for the possibility of running early versions of NeXTSTEP. Toni Wilen was able to adapt their work onto WinUAE and it is not possible to run AMIX.!

Reading through this thread,  I was able to put together the needed bits, and get it running under CrossOver, by using the pre-configured settings for WinUAE, and replacing the exe with the new beta exe, the supplied hard disk image from amigaunix.com and I was up and running in no time!  The only real change from the config was to change the SCSI ID of the hard disk from 0 to 6.

Screen Shot 2013-01-13 at 8.48.54 PM
AMIX starting up on WinUAE

The default password is wasp.  I thought it was kind of interesting that AMIX includes ‘dungeon’.  really cool!

Open Look on AMIX
Open Look on AMIX

I am unsure of how to enable the high resolution graphics, but sadly the Amiga known for its multimedia capabilities, AMIX with stock graphics runs in monochrome.  Such a major underwhelming thing.

Oh well, for anyone inclined you can now run AMIX, and enjoy another dead SYSV.