(This is a guest post by xorhash.)
1. Introduction
Did I say I’m done with UNIX Seventh Edition (V7)? How silly of me; of course I’m not. V7 is easy to study, after all.
Something that’s always bothered me about the stdio.h
primitives fread
() and fwrite
() are their weak guarantees about what they actually do. Is a short read or write “normal†in the sense that I should normally expect it? While this makes no answer about modern-day operating systems, a look at V7 may enlighten me about what the historical precedent is.
As an aside: It’s worth noting that the stdio.h
functions are some of the few that require a header. It was common historical practice not to declare functions in headers, just see crypt(3) as an example.
I will first display the man page, then ask the questions I want to answer, then look at the implementation and finally use that gained knowledge to answer the questions.
2. Into the Man Page
The man page for fread
() and fwrite
() is rather terse. Modern-day man pages for those functions are equally terse, though, so this is not exactly a novelty of age. Here’s what it reads:
NAME
fread, fwrite – buffered binary input/output
SYNOPSIS
#include <stdio.h>
fread(ptr, sizeof(*ptr), nitems, stream)
FILE *stream;
fwrite(ptr, sizeof(*ptr), nitems, stream)
FILE *stream;
DESCRIPTION
Fread reads, into a block beginning at ptr, nitems of data of the type of *ptr from the named input stream. It returns the number of items actually read.
Fwrite appends at most nitems of data of the type of *ptr beginning at ptr to the named output stream. It returns the number of items actually written.
SEE ALSO
read(2), write(2), fopen(3), getc(3), putc(3), gets(3), puts(3), printf(3), scanf(3)
DIAGNOSTICS
Fread and fwrite return 0 upon end of file or error.
So there are the following edge cases that are interesting:
- In
fread
(): If sizeof(*ptr)
is greater than the entire file, what happens?
- If
sizeof(*ptr) * nitems
overflows, what happens?
- Is the “number of items actually read/written†guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
- Is the “number of items actually written†guaranteed to have written every item in its entirety?
- What qualifies as error?
3. A Look at fread
()
Note: All file paths for source code are relative to /usr/src/libc/stdio/
unless noted otherwise. You can read along at the TUHS website.
rdwr.c
implements fread
(). fread
() is simple enough; it’s just a nested loop. The outer loop runs nitems
times. The outer loop sets the number of bytes to read (sizeof(*ptr)
) and runs the inner loop. The inner loop calls getc
() on the input FILE *stream
and writes each byte to *ptr
until either getc
() returns a value less <Â 0 or all bytes have been read.
/usr/include/stdio.h
implements getc
(FILE *p
) as a C preprocessor macro. If there is still data in the buffer, it returns the next character and advances the buffer by one. Interestingly, *(p)->_ptr++&0377
is used to return the character, despite _ptr
being a char *
. I’m not sure why that &0377
(&0xFF
is there. If there is no data in the buffer, it instead returns _filbuf
(p
).
filbuf.c
implements _filbuf
(). This function is a lot more complex than the other ones until now. It begins with a check for the _IORW flag and, if set, sets the _IOREAD flag as well. It then checks if _IOREAD is not set or if _IOSTRG is set and returns EOF (defined as -1 in stdio.h
) if so. These all seem rather inconsequential to me. I can’t make heads or tails of _IOSTRG, however, but it seems irrelevant; _IOSTRG is only ever set internally in sprintf and sscanf for temporary internal FILE
objects. After those two flag checks, _filbuf
() allocates a buffer into iop-<_base
, which seems to be the base pointer of the buffer. If flag _IONBF is set, which happens when setbuf
() is used to switch to unbuffered I/O, a temporary, static buffer is used instead. Then read
() is called, requesting either 1 bytes if unbuffered I/O is requested or BUFSIZ
bytes. If read
() returned 0, the FILE
is flagged as end-of-file and EOF is returned by _filbuf
(). If read
() returned <0, the FILE
is flagged as error and EOF is returned by _filbuf
(). Otherwise, the first character that has been read is returned by _filbuf
() and the buffer pointer incremented by one.
According to its man page, read
() only returns 0 on end-of-file. It can also return -1 on “many conditionsâ€, namely “physical I/O errors, bad buffer address, preposterous nbytes, file descriptor not that of an input fileâ€
As an aside, BUFSIZ
still exists today. ISO C11 § 7.21.2 no. 9 dictates that BUFSIZ
must be at least 256. V7 defines it as 512 in stdio.h
. One is inclined to note that on V7, a filesystem block was understood 512Â bytes length, so this was presumably chosen for efficient I/O buffering.
4. A Look at fwrite
()
rdwr.c
also implements fwrite
(). fwrite
() is effectively the same as fread
(), except the inner loop uses putc
(). After every inner loop, a call to ferror
() is made. If there was indeed an error, the outer loop is stopped.
/usr/include/stdio.h
implements putc
(int x
, FILE *p
) as a C preprocessor macro. If there is still room in the buffer, the write happens into the buffer. Otherwise, _flsbuf
() is called.
flsbuf.c
implements _flsbuf
(int c
, FILE *iop
). This function, too, is more complex than the ones until now, but becomes more obvious after reading _filbuf
(). It starts with a check if _IORW is set and if so, it’ll set _IOWRT and clear the EOF flag. Then it branches into two major branches: the _IONBF branch without buffering, which is a straight call to write
(), and the other branch, which allocates a buffer if none exists already or otherwise calls write
() if the buffer is full. If write
() returned less than expected, the error flag is set and EOF returned. Otherwise, it returns the character that was written.
According to its man page, write
() returns the number of characters (bytes) actually written; a non-zero value “should be regarded as an errorâ€. With only a cursory glance over the code, this appears to happen for similar reasons as read
(), which is either physical I/O error or bad parameters.
5. Conclusions
In fread
(): If sizeof(*ptr)
is greater than the entire file, what happens?
On this under-read, fread
() will end up reading the entire file into the memory at ptr
and still return 0. The I/O happens byte-wise via getc
(), filling up the buffer until getc
() returns EOF. However, it will not return EOF until a read
() returns 0 on EOF or -1 on error. This result may be meaningful to the caller.
If sizeof(*ptr) * nitems
overflows, what happens?
No overflow can happen because there is no multiplication. Instead, two loops are used, which avoids the overflow issue entirely. (If there are strict filesystem constraints, however, it may be de-facto impossible to read enough bytes that sizeof(*ptr) * nitems
overflows. And of course, there’s no way you could have enough RAM on a PDP-11 for the result to actually fit into memory.)
Is the “number of items actually read/written†guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
Partially: Both fread
() and fwrite
() short-circuit on error. This causes the number of items that have actually been read or written successfully to be returned. The only relevant error condition is filesystem I/O error. Due to the byte-wise I/O, it’s possible that there was a partial read or write for the last element, however. Therefore, it would be more accurate to say that the “number of items actually read/written†is guaranteed to be the number of non-partial items that can be read/written. A short read or short write is an abnormal condition.
Is the “number of items actually written†guaranteed to have written every item in its entirety?
No, it isn’t. A partial write is possible. If a series of structs is written and then to be read out again, however, this is not a problem: fread
() and fwrite
() only return the count of full items read or written. Therefore, the partial write will not cause a partial read issue. If a set of bytes is written, this is an issue: There will be incomplete data – possibly to be parsed by the program. It is therefore to preferable to write (and especially read) arrays of structs than to write and read arrays of bytes. (From a modern-day perspective, this is horrendous design because this means data files are not portable across platforms.)
What qualifies as error?
Effectively, only a physical I/O error or a kernel bug. Short fread
() or fwrite
() return values are abnormal conditions. I’m not sure if there is the possibility that the process got a signal and the current read
() or write
() ends up writing nothing before the EINTR
; this seems to be more of a modern-day problem than something V7 concerned itself.