(This is a guest post by xorhash.)
Did I say I’m done with UNIX Seventh Edition (V7)? How silly of me; of course I’m not. V7 is easy to study, after all.
Something that’s always bothered me about the
fwrite() are their weak guarantees about what they actually do. Is a short read or write “normal” in the sense that I should normally expect it? While this makes no answer about modern-day operating systems, a look at V7 may enlighten me about what the historical precedent is.
As an aside: It’s worth noting that the
stdio.h functions are some of the few that require a header. It was common historical practice not to declare functions in headers, just see crypt(3) as an example.
I will first display the man page, then ask the questions I want to answer, then look at the implementation and finally use that gained knowledge to answer the questions.
2. Into the Man Page
The man page for
fwrite() is rather terse. Modern-day man pages for those functions are equally terse, though, so this is not exactly a novelty of age. Here’s what it reads:
fread, fwrite – buffered binary input/output
fread(ptr, sizeof(*ptr), nitems, stream)
fwrite(ptr, sizeof(*ptr), nitems, stream)
Fread reads, into a block beginning at ptr, nitems of data of the type of *ptr from the named input stream. It returns the number of items actually read.
Fwrite appends at most nitems of data of the type of *ptr beginning at ptr to the named output stream. It returns the number of items actually written.
read(2), write(2), fopen(3), getc(3), putc(3), gets(3), puts(3), printf(3), scanf(3)
Fread and fwrite return 0 upon end of file or error.
So there are the following edge cases that are interesting:
sizeof(*ptr)is greater than the entire file, what happens?
sizeof(*ptr) * nitemsoverflows, what happens?
- Is the “number of items actually read/written” guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
- Is the “number of items actually written” guaranteed to have written every item in its entirety?
- What qualifies as error?
3. A Look at
Note: All file paths for source code are relative to
/usr/src/libc/stdio/ unless noted otherwise. You can read along at the TUHS website.
fread() is simple enough; it’s just a nested loop. The outer loop runs
nitems times. The outer loop sets the number of bytes to read (
sizeof(*ptr)) and runs the inner loop. The inner loop calls
getc() on the input
FILE *stream and writes each byte to
*ptr until either
getc() returns a value less < 0 or all bytes have been read.
FILE *p) as a C preprocessor macro. If there is still data in the buffer, it returns the next character and advances the buffer by one. Interestingly,
*(p)->_ptr++&0377 is used to return the character, despite
_ptr being a
char *. I’m not sure why that
&0xFF is there. If there is no data in the buffer, it instead returns
_filbuf(). This function is a lot more complex than the other ones until now. It begins with a check for the _IORW flag and, if set, sets the _IOREAD flag as well. It then checks if _IOREAD is not set or if _IOSTRG is set and returns EOF (defined as -1 in
stdio.h) if so. These all seem rather inconsequential to me. I can’t make heads or tails of _IOSTRG, however, but it seems irrelevant; _IOSTRG is only ever set internally in sprintf and sscanf for temporary internal
FILE objects. After those two flag checks,
_filbuf() allocates a buffer into
iop-<_base, which seems to be the base pointer of the buffer. If flag _IONBF is set, which happens when
setbuf() is used to switch to unbuffered I/O, a temporary, static buffer is used instead. Then
read() is called, requesting either 1 bytes if unbuffered I/O is requested or
BUFSIZ bytes. If
read() returned 0, the
FILE is flagged as end-of-file and EOF is returned by
read() returned <0, the
FILE is flagged as error and EOF is returned by
_filbuf(). Otherwise, the first character that has been read is returned by
_filbuf() and the buffer pointer incremented by one.
According to its man page,
read() only returns 0 on end-of-file. It can also return -1 on “many conditions”, namely “physical I/O errors, bad buffer address, preposterous nbytes, file descriptor not that of an input file”
As an aside,
BUFSIZ still exists today. ISO C11 § 7.21.2 no. 9 dictates that
BUFSIZ must be at least 256. V7 defines it as 512 in
stdio.h. One is inclined to note that on V7, a filesystem block was understood 512 bytes length, so this was presumably chosen for efficient I/O buffering.
4. A Look at
rdwr.c also implements
fwrite() is effectively the same as
fread(), except the inner loop uses
putc(). After every inner loop, a call to
ferror() is made. If there was indeed an error, the outer loop is stopped.
FILE *p) as a C preprocessor macro. If there is still room in the buffer, the write happens into the buffer. Otherwise,
_flsbuf() is called.
FILE *iop). This function, too, is more complex than the ones until now, but becomes more obvious after reading
_filbuf(). It starts with a check if _IORW is set and if so, it’ll set _IOWRT and clear the EOF flag. Then it branches into two major branches: the _IONBF branch without buffering, which is a straight call to
write(), and the other branch, which allocates a buffer if none exists already or otherwise calls
write() if the buffer is full. If
write() returned less than expected, the error flag is set and EOF returned. Otherwise, it returns the character that was written.
According to its man page,
write() returns the number of characters (bytes) actually written; a non-zero value “should be regarded as an error”. With only a cursory glance over the code, this appears to happen for similar reasons as
read(), which is either physical I/O error or bad parameters.
sizeof(*ptr) is greater than the entire file, what happens?
On this under-read,
fread() will end up reading the entire file into the memory at
ptr and still return 0. The I/O happens byte-wise via
getc(), filling up the buffer until
getc() returns EOF. However, it will not return EOF until a
read() returns 0 on EOF or -1 on error. This result may be meaningful to the caller.
sizeof(*ptr) * nitems overflows, what happens?
No overflow can happen because there is no multiplication. Instead, two loops are used, which avoids the overflow issue entirely. (If there are strict filesystem constraints, however, it may be de-facto impossible to read enough bytes that
sizeof(*ptr) * nitems overflows. And of course, there’s no way you could have enough RAM on a PDP-11 for the result to actually fit into memory.)
Is the “number of items actually read/written” guaranteed to be the number of items that can be read/written (until either EOF or I/O error)?
fwrite() short-circuit on error. This causes the number of items that have actually been read or written successfully to be returned. The only relevant error condition is filesystem I/O error. Due to the byte-wise I/O, it’s possible that there was a partial read or write for the last element, however. Therefore, it would be more accurate to say that the “number of items actually read/written” is guaranteed to be the number of non-partial items that can be read/written. A short read or short write is an abnormal condition.
Is the “number of items actually written” guaranteed to have written every item in its entirety?
No, it isn’t. A partial write is possible. If a series of structs is written and then to be read out again, however, this is not a problem:
fwrite() only return the count of full items read or written. Therefore, the partial write will not cause a partial read issue. If a set of bytes is written, this is an issue: There will be incomplete data – possibly to be parsed by the program. It is therefore to preferable to write (and especially read) arrays of structs than to write and read arrays of bytes. (From a modern-day perspective, this is horrendous design because this means data files are not portable across platforms.)
What qualifies as error?
Effectively, only a physical I/O error or a kernel bug. Short
fwrite() return values are abnormal conditions. I’m not sure if there is the possibility that the process got a signal and the current
write() ends up writing nothing before the
EINTR; this seems to be more of a modern-day problem than something V7 concerned itself.
“I’m not sure why that &0377 (&0xFF is there.”
Presumably in case char is signed — without masking to 8 bits, signed chars with bit 7 set would be extended to negative ints, which would be detected as EOF.
neozeed, thanks so much for the post.Really thank you! Keep writing.
Thanks, but this one post was actually from xorhash.
I’ll make the next one “(This is a guest post by xorhash.)” be marquee, blinking, font-size 300%, very bold.
That sound good?