(This is a guest post by xorhash.)
I’ve been on a trip on the memory lane lately, digging around old manuals of UNIX® operating system before BSD.† In doing so, I’ve come across the sources for the 7th Edition manuals. I wanted to show one part of volume 2A to other people, but didn’t want to make them download the entire 336 pages of volume 2A for the part in question. The part I wanted to extract was “LEARN — Computer-Aided Instruction on UNIX”, starting at p. 107 in the volume 2A PDF file).
A normal person would, I presume, try to split the PDF file. That is straightforward and produces the expected results. I believe I needn’t state that you wouldn’t be reading this if I solved this problem like any sane person would. Instead, I opted to rebuild the PDF from the troff sources provided at the link above.
I am not a very clever man, and thus I completely disregarded the generation procedure that was already spelled out. However, it wasn’t exactly specific anyway, so I didn’t miss out on much.
Getting the sources
So I knew what I needed to do: Get the troff sources. I asked that the Heavens have mercy on my poor soul if this requires a lot of adjustment for 2017 text processing tools. However, a man must do what a man must do. The file in question was called “vol2/learn.bun”. I had no idea what a bun file is, hoped it wasn’t related to steamed buns and clicked it. As it turns out, it’s just what we would call a self-extracting archive today. The shell commands are not very weird, so the extraction process actually worked out just fine. Now I had files “
p0” through “
p7”. Except what happened to “
p1”, the world will never know.
I’ve dabbled in man pages before, but that was mostly mandoc, not actual troff.
Accordingly, the first attempt at getting something going was as naive as it could get:
$ groff -Tpdf p* | zathura -
It led to, shall we say, varying results.
Clearly, I was doing something very fundamentally wrong. Conveniently, volume 2A also had a lot of troff documentation. Apparently I was supposed to pass
-ms and first run tbl(1) over the troff source before actually giving it to groff. That sounded like a good idea, but the results were still somewhat off:
Allow me to express my doubts that this text was written in 2017. If you compare the output with the known-good PDF, you’ll also notice that, somehow, “Bell Laboratories, Murray Hill, New Jersey 07974” turned into “CAI”. Unfortunate.
Back to Square One and Pick Up the Breadcrumbs
Continuing to read the page I got the
learn.bun from, I also spied a section called “Macros and References”. That sounds relevant to my interests.
tmac.s, which after studying groff(1) seems to be what would get used with
-ms references some files in
/usr/lib/tmac. I was not in the mood to let this flood over into my system, so I had to make minor adjustments and turn it into relative paths. I also renamed
tmac.os to avoid colliding with the one provided by groff, making the new invocation:
$ tbl p* | groff -M./macros -mos -Tpdf | zathura -
Now we’re getting somewhere:
It’s better than the previous attempts. But there are also some warnings and problems that need cleaning up:
- There’s a note that Bell Laboratories holds the UNIX®
trademark, which is no longer true.†
- Now, this most certainly was not written in December 21,
tmac.os:806: warning: numeric expression expected (got `\')
- Every time the
.UXmacro was requested, I got:
warning: macro `ev1' not defined (possibly missing space after `ev')
environment stack underflow
Point 1 was easy to address, it’s a simple text change. Point 2 was caused by spurious dots in front of a call to
.ND. However, the actual volume 2A PDF said a different date than in the file, so I adjusted that to match (June 18, 1976 to January 30, 1979).
And Down the Slippery Slope
As for points 3 and 4… Let’s just say groff/troff macros are definitely not meant to be written or read by humans and it’s a feat comparable to magic that someone wrote this set of troff macros. Line 806 is
.ch FO \\n(YYu. Supposedly, that changes the location of a page trap when the given macro is invoked. The second argument is meant to be a distance, which explains why groff is complaining. I tried to checked what groff does and left none the wiser. FO seems related to the page footer, I seemed to get away with just deleting that line, though.
Finally, point 4. Apparently,
.ev1 was used multiple times in the
tmac.os. This looked like it should’ve been
.ev 1 instead. Changing those, lo and behold,
.UX stopped behaving funky for the most part. Yet for some reason, I’d still get multiple footnotes about the trademark ownership of the UNIX® trademark.†
tmac.os sets a troff register (
GA) when the
.UX macro is first encountered so that the footnote is only made once. The footnote is being made twice. Something does not add up here.
.AI (author’s institution) resets
GA, but the first
.UX comes after
.AI, so that’s not the problem. Removing the
.AE macros from page 1 caused only one footnote to be made. Thus, I infer it’s actually intended behavior that the footnote is made once for the abstract and once for the main body. Checking with the volume 2A PDF again, I realized that point 4 was, in fact, fixed just by the
ev1 changes and I was just chasing a bug that does not exist. I really should’ve checked the PDF twice.
The abstract finally looks okay.
Done! Wait, No, Almost
Okay, we’re done, we can go home, right? Almost, one last thing to do: On the last page, there’s something really important missing: the bibliography. Instead, there’s just “$LIST$” there. We can’t just turn Brian W. Kernighan and Michael E. Lesk into plagiarists!
Back to the troff documentation in volume 2A, there’s a match for “$LIST$” on p. 183. Apparently I need a reference file and preprocess the file with refer(1). That sounds simple enough. Fortunately, I got the reference file along with the macros above, so I didn’t have to look for that separately.
$ refer -pRv7man -e p* | tbl | groff -M./macros -mos -Tpdf | zathura -
Of course. Why would it work? That’d have been too much to ask for.
At least I get some nice hints:
refer:p2:148: no matches for `skinner teaching 1961'
refer:p3:114: no matches for `kernighan editor tutorial 1974'
The troff documentation conveniently explains the format for the reference file, so I could just add these two entries to
Rv7man and be done with it. Thankfully, the pre-compiled PDF of the volume 2A manual had the information necessary to compile the bibliography entries with.
%T Why We Need Teaching Machines %A B. F. Skinner %J Harvard Educational Review %V 31 %P 377-398 %D 1961 %T A Tutorial Introduction to the Unix Editor ed %A B. W. Kernighan %D 1974
And of course, here is the product of this whole ordeal.
The Heavens were feeling somewhat merciful, but only just enough that I could waste no more than a day on this project. They really wanted me to spend that day on it, though.
On a side note, “the missing learn references” aren’t available from the link that was
provided. http://cm.bell-labs.com/cm/cs/who/bwk/learn.tar.gz is now down, though the web archive still has it. Needless to say, I didn’t read that.
I will never, ever touch troff/groff again. mandoc is good at what it does and I’ll stick to mandoc for writing man pages. But if I ever need to get something typeset nicely from plain text?
LaTeX is the answer.
Not even once.
†UNIX® is a registered trademark of The Open Group.
I wonder how this would have gone with Heirloom troff…
So I thought to myself “sure, let’s try it”. It was a bitch to install but then I realized that, actually, I skimmed mk.config too quickly and it made sense, though the defaults are wacky. I guess that’s for the nice vintage feel of old code.
Results: I’m glad I didn’t go that route. refer straight up does nothing, failing with either “abuff not big enough 53” or segfaulting, not passing Go, not collecting $200. Fixing this would probably require some source code patches and more details where and how it chokes. This happens even with a significantly shortened bibliography.
Leaving refer out of the pipeline, the bundled tmac.s with heirloom troff almost works: It has .UX footnotes and indeed only one of them (in the abstract), but it also has way too much line spacing. However, the abstract overflows with the dead citations for some reason. Using the the tmac.s from the website linked in the article yields reasonable results with only path modifications (to fix up the includes with .so).
tmac.s not needing actual changes was very nice, but hunting down the issues with refer might’ve left me with much more work on my hands. I respect the work that went into the heirloom doctools, though I do wonder when the last commit even happened.
Weather update: I went back to this. I checked what caused the issues in heirloom troff. My installation directory was fairly lengthy ($HOME/local/heirloom-doctools), which ended up causing callhunt() being called with /home/xorhash/local/heirloom-doctools/lib/reftools/papers/Ind and other excessively long paths, which caused the “abuff too long” errors. Things work after shortening the installation path
refer -p Rv7man.fixed -e p* | tbl | troff -ms | dpost | ps2pdf > x.ps && ps2pdf x.ps LEARN-Computer-Aided_Instruction_on_UNIX-heirloom-troff.pdf && rm x.ps
This is the Result:
And people thought my Style & Diction adventure was crazy..!
Great job, it’s interesting to see the actual pipeline for creating the typeset documents from the late 1970’s although I’d rather use MS Word 1.1 on OS/2 in CGA than go through that kind of hell.
I thought v7 man pages used -man, not -ms. But maybe I’m remembering it wrong.
v7 man pages indeed did use -man. However, volume 2 is a set of documents in -ms. See https://s3.amazonaws.com/plan9-bell-labs/7thEdMan/bswv7.html and http://tuhs.superglobalmegacorp.com/PDP-11/Trees/V7/usr/doc/run
> LaTeX is the answer.
> Not troff.
> Never troff.
> Not even once.
Update: Once I’ve become used to it, I feel more productive writing troff with -ms than (La)TeX. Do give troff a fair shot.
Yes! Give it a fair shot! I have run it for 40 years, and still love it. I had to write my own set of macros. But… it still blows WYSIWYG out of the water.