(This is a guest post by xorhash.)
Introduction
I’ve been on a trip on the memory lane lately, digging around old manuals of UNIX® operating system before BSD.†In doing so, I’ve come across the sources for the 7th Edition manuals. I wanted to show one part of volume 2A to other people, but didn’t want to make them download the entire 336 pages of volume 2A for the part in question. The part I wanted to extract was “LEARN — Computer-Aided Instruction on UNIXâ€, starting at p. 107 in the volume 2A PDF file).
A normal person would, I presume, try to split the PDF file. That is straightforward and produces the expected results. I believe I needn’t state that you wouldn’t be reading this if I solved this problem like any sane person would. Instead, I opted to rebuild the PDF from the troff sources provided at the link above.
I am not a very clever man, and thus I completely disregarded the generation procedure that was already spelled out. However, it wasn’t exactly specific anyway, so I didn’t miss out on much.
Getting the sources
So I knew what I needed to do: Get the troff sources. I asked that the Heavens have mercy on my poor soul if this requires a lot of adjustment for 2017 text processing tools. However, a man must do what a man must do. The file in question was called “vol2/learn.bunâ€. I had no idea what a bun file is, hoped it wasn’t related to steamed buns and clicked it. As it turns out, it’s just what we would call a self-extracting archive today. The shell commands are not very weird, so the extraction process actually worked out just fine. Now I had files “p0
†through “p7
â€. Except what happened to “p1
â€, the world will never know.
First Steps
I’ve dabbled in man pages before, but that was mostly mandoc, not actual troff.
Accordingly, the first attempt at getting something going was as naive as it could get:
$ groff -Tpdf p* | zathura -
It led to, shall we say, varying results.
Clearly, I was doing something very fundamentally wrong. Conveniently, volume 2A also had a lot of troff documentation. Apparently I was supposed to pass -ms
and first run tbl(1) over the troff source before actually giving it to groff. That sounded like a good idea, but the results were still somewhat off:
Allow me to express my doubts that this text was written in 2017. If you compare the output with the known-good PDF, you’ll also notice that, somehow, “Bell Laboratories, Murray Hill, New Jersey 07974†turned into “CAIâ€. Unfortunate.
Back to Square One and Pick Up the Breadcrumbs
Continuing to read the page I got the learn.bun
from, I also spied a section called “Macros and Referencesâ€. That sounds relevant to my interests. tmac.s
, which after studying groff(1) seems to be what would get used with -ms
references some files in /usr/lib/tmac
. I was not in the mood to let this flood over into my system, so I had to make minor adjustments and turn it into relative paths. I also renamed tmac.s
to tmac.os
to avoid colliding with the one provided by groff, making the new invocation:
$ tbl p* | groff -M./macros -mos -Tpdf | zathura -
Now we’re getting somewhere:
It’s better than the previous attempts. But there are also some warnings and problems that need cleaning up:
- There’s a note that Bell Laboratories holds the UNIX®
trademark, which is no longer true.†- Now, this most certainly was not written in December 21,
19117, either. tmac.os:806: warning: numeric expression expected (got `\')
- Every time the
.UX
macro was requested, I got:
warning: macro `ev1' not defined (possibly missing space after `ev')
environment stack underflow
Point 1 was easy to address, it’s a simple text change. Point 2 was caused by spurious dots in front of a call to .ND
. However, the actual volume 2A PDF said a different date than in the file, so I adjusted that to match (June 18, 1976 to January 30, 1979).
And Down the Slippery Slope
As for points 3 and 4… Let’s just say groff/troff macros are definitely not meant to be written or read by humans and it’s a feat comparable to magic that someone wrote this set of troff macros. Line 806 is .ch FO \\n(YYu
. Supposedly, that changes the location of a page trap when the given macro is invoked. The second argument is meant to be a distance, which explains why groff is complaining. I tried to checked what groff does and left none the wiser. FO seems related to the page footer, I seemed to get away with just deleting that line, though.
Finally, point 4. Apparently, .ev1
was used multiple times in the tmac.os
. This looked like it should’ve been .ev 1
instead. Changing those, lo and behold, .UX
stopped behaving funky for the most part. Yet for some reason, I’d still get multiple footnotes about the trademark ownership of the UNIX® trademark.† tmac.os
sets a troff register (GA
) when the .UX
macro is first encountered so that the footnote is only made once. The footnote is being made twice. Something does not add up here..AI
(author’s institution) resets GA
, but the first .UX
comes after .AI
, so that’s not the problem. Removing the .AB
/.AE
macros from page 1 caused only one footnote to be made. Thus, I infer it’s actually intended behavior that the footnote is made once for the abstract and once for the main body. Checking with the volume 2A PDF again, I realized that point 4 was, in fact, fixed just by the ev1
changes and I was just chasing a bug that does not exist. I really should’ve checked the PDF twice.
The abstract finally looks okay.
Done! Wait, No, Almost
Okay, we’re done, we can go home, right? Almost, one last thing to do: On the last page, there’s something really important missing: the bibliography. Instead, there’s just “$LIST$†there. We can’t just turn Brian W. Kernighan and Michael E. Lesk into plagiarists!
Back to the troff documentation in volume 2A, there’s a match for “$LIST$†on p. 183. Apparently I need a reference file and preprocess the file with refer(1). That sounds simple enough. Fortunately, I got the reference file along with the macros above, so I didn’t have to look for that separately.
$ refer -pRv7man -e p* | tbl | groff -M./macros -mos -Tpdf | zathura -
Of course. Why would it work? That’d have been too much to ask for.
At least I get some nice hints:
refer:p2:148: no matches for `skinner teaching 1961'
refer:p3:114: no matches for `kernighan editor tutorial 1974'
The troff documentation conveniently explains the format for the reference file, so I could just add these two entries to Rv7man
and be done with it. Thankfully, the pre-compiled PDF of the volume 2A manual had the information necessary to compile the bibliography entries with.
%T Why We Need Teaching Machines %A B. F. Skinner %J Harvard Educational Review %V 31 %P 377-398 %D 1961 %T A Tutorial Introduction to the Unix Editor ed %A B. W. Kernighan %D 1974
And of course, here is the product of this whole ordeal.
Closing Remarks
The Heavens were feeling somewhat merciful, but only just enough that I could waste no more than a day on this project. They really wanted me to spend that day on it, though.
On a side note, “the missing learn references†aren’t available from the link that was
provided. http://cm.bell-labs.com/cm/cs/who/bwk/learn.tar.gz is now down, though the web archive still has it. Needless to say, I didn’t read that.
I will never, ever touch troff/groff again. mandoc is good at what it does and I’ll stick to mandoc for writing man pages. But if I ever need to get something typeset nicely from plain text?
LaTeX is the answer.
Not troff.
Never troff.
Not even once.
†UNIX® is a registered trademark of The Open Group.