August 02, 2024

The Evolution of C: Heresy and Prophecy

by Bill Tuthill

from the January 1985 issue of Unix Review magazine

C is descended from B, which was descended from BCPL. BCPL (Basic Combined Programming Language) was developed in 1967 by Martin Richards. B was an interpretive language written in 1970 by Ken Thompson (1) after he abandoned a Fortran implementation for the PDP-7.

BCPL and B were typeless languages, which may account for the type permissiveness of C. They restricted their scope to machine words and were rather low level. However, they provided structured programming constructs similar to those in Algol. BCPL, B, and C all provide pointers and address arithmetic. All three pass function parameters by value, rather than by address, but permit passing by address if desired.

The first real C compiler was completed in 1972, at which time the only supported types were char, int, float and double. After the addition of structures in 1973, the UNIX kernel was successfully recoded into C, which helped rationalize and organize the operating system. (2) The C language continued to evolve, largely because of porting efforts, and was finally codified in 1978. (3) By then, the type adjectives unsigned, short, and long had been added, in addition to the type aggregates union, and typedef. More importantly, an efficient and portable Standard I/O Library had been developed.

C is gradually losing its type flexibility and becoming more and more like Pascal, the language of wimps and quiche-eaters. In 1980, two new types were added, void and enum. Both were derived from Algol 68 and Pascal. The first is used primarily to make C code harder to read, but easier for lint to digest. The second might actually be useful. C routines that return a value are comparable to Pascal functions, whereas functions declared as void are comparable to Pascal procedures. At this point, it may be a good idea to avoid void and enum because most C compilers on cheap micros don’t yet support them.

This is the current state of most C compilers. The C compilers on System V and the C compiler on 4.2 BSD are almost identical. The Berkeley compiler was the first to offer infinite-length variable names in order to support a companion Pascal compiler. The System V.2 compiler now offers the same feature. Despite their advantages, infinite-length variables can be expensive to implement and can cause portability problems when software is moved to systems with older compilers. Dennis Ritchie has to be commended for shepherding C programmers onto a narrow path. Despite the proliferation of UNIX versions, there is but one C.

ANSI STANDARD C

A sign that C has become a major commercial language can be seen in the current effort to standardize it. The American National Standards Institute (ANSI) has formed committee X3J11 to handle the task. Their draft standard that has not yet been approved. Nonetheless, I believe this C standard represents the most significant step forward for the language since Kernighan and Ritchie’s white book appeared.

The standards committee is divided into three subcommittees: environment, libraries, and language. (4) Programming environments are a hazy area. Most likely, the main(argc.argv) argument passing convention will be required on all operating systems. However, the function of the UNIX environment cannot be duplicated on many other operating systems, so the third parameter envp (environment pointer) will probably be dropped. European character sets are still under discussion.

The library routines in section 3 of the Unix Programmer’s Manual (except for Bessel functions) will be part of the standard. However, system calls in section 2 will be dropped because they cannot always be duplicated on non-UNIX systems. The exception to this is signal(2), which is necessary to make C programs re-entrant.

The language subcommittee started with the System V.2 C Reference Manual. It should be noted, though, that there have been three major changes since the C Reference Manual was published with Kernighan and Ritchie’s The C Programming Language.

First, identifiers are significant to 31 characters, rather than 8 (as on Version 7), or infinite length (as on Berkeley UNIX and System V.2). Originally the committee was going to limit external names to 6 characters without case distinction, but public outcry was so strong that this will be left as an implementation detail.

The second change is that structure and union assignment is possible, and that structures and unions may be passed as parameters. Member names are local to structures and unions, instead of being global.

Finally, the void and enum types have been added. A function returning no value actually returns the type void, and programmers can throw away unwanted values by casting to void. For example, you can throw away the value returned by fclose() with the expression:

(void)fclose(fp);

The enum type allows you to replace ugly preprocessor code like this:

#define DEV202      1
#define DEVAPS5     2
#define DEV8600     3
#define DEVIMAGEN   4
#define DEVQMS      5
int dev;

with the more streamlined:

enum devtype { DEV202, DEVAPS5, DEV8600,\
    DEVIMAGEN, DEVQMS } dev;

Variables of type enum are treated as second-class integers — in Steve Johnson’s Portable C Compiler, for example, arithmetic comparison (except equality) is illegal. The standard may change this, however, particularly because enum comparisons are allowed in Ritchie’s C compiler.

The committee has introduced many changes above and beyond the System V.2 standard. Arguments to functions may be declared, for instance, if the programmer wants the compiler to check them. For example, you could say:

   extern int tread(char *, int, int, FILE *);

In the event of a type mismatch, the same conversions as for the assignment operator apply. This means that NULL pointer arguments will no longer have to be cast to the appropriate type! Variable-argument functions can be declared like so:

extern int printf(char *, ...);

The convention for declaring functions that take no parameters is:

extern int rand(void);

The new data type const marks variables as readonly, with run-time assignment forbidden. This used to be done less reliably, and without placing an entry in the symbol table, by preprocessor definitions. The type const will be useful for data placed in ROM on special hardware. Also, it makes the :rofix kludge obsolete (this was a way to move data to text space by changing .data assembly code to .text).

If all operands in an expression are of type float, the compiler is allowed (but not required) to evaluate the expression in single, rather than double, precision. Casts may be used to force double precision evaluation. Numeric constants are treated as double unless explicitly cast to float. If function arguments are declared, passing a float to a function expecting a double will be harmless.

The preprocessor is part of the language definition. Its syntax has been extended with #elif so that if-else blocks can be coded more easily. Space before the sharp ( # ) will be permitted.

Two string constants (not variables) next to each other in the source code are considered concatenated. This makes it easier to continue strings across line boundaries.

The types unsigned char, unsigned short, and unsigned long are part of the specification. Johnson’s Portable C Compiler has always accepted these types, but they were not defined in the C Reference Manual. Plain char may be either signed or unsigned depending on the implementation.

Promiscuous pointer assignments are considered illegal (on most machines they now just generate a warning message). You must use casts when mixing pointer types or mixing integers with pointers. A new kind of pointer, void,* cannot be dereferenced but can be assigned to any other type of pointer without a cast. Before, char* was the universal pointer that could point to anything. The earlier declaration of fread() should really be:

extern int fread(void *, int, int, FILE *);

The compiler will make the appropriate pointer conversions. The new storage class volatile (the name is tentative) means that the compiler should not optimize references to data bearing this label. This will make it possible to write better optimizing C compilers.

The selection expression of a switch can be of any integer type, including long or short. The unary plus operator (analogous with the unary minus operator) does nothing. This is for consistency with the library routines atoi() and atof().

When a union is initialized, the type of the initializer is the type of the first member in the union. This may not be ideal, but it is simple.

Hexadecimal string escapes have been added. To put an ESC in a string, you say:

"Here’s a real ESC: \x1b"

Previously you had to use octal escapes. This is an admission that hexadecimal is more common in the computer world than octal.

Some things will disappear. The keywords entry, asm, and fortran will not be part of the standard, though the last two may be recognized as valid extensions. The keywords long float will no longer be a synonym for double. The octal digits 08 and 09, which used to be interpreted as 10 and 11, will no longer be valid. It will be illegal to take the address of a register variable.

The names of arguments to functions will not be permitted to clash with the names of automatic variables. This code will be illegal:

function(arg)
int arg;
{
    int arg;
}

Some existing compilers interpret this as nested scope, where the inner declaration hides the outer one. No good programmer would write code like that anyway.

The chair of the draft standard language subcommittee is Larry Rosier of AT&T Bell Laboratories, the author of an interesting article on C evolution. (5) Comments should be addressed to him.

FUTURE DIRECTIONS

Beginning C programmers complain that error messages are confusing, and often have difficulties with pointers. Advanced C programmers usually grow fond of the language, but recognize its shortcomings and limitations. The preprocessor is primitive and has different syntax than the rest of the language. The aesthetics of the case statement are horrible, and bitwise operators should bind more tightly than they do. Most C compilers are painfully slow, as is the UNIX loader.

It’s easy to imagine a much better language, but there are only a few serious contenders — among them LISP, SmallTalk, Modula2, MainSail, and Ada. The first two require specialized environments, the next two don’t appear to be a great step forward, and Ada isn’t real yet. In computing, we have to live with what we have.

Always an evolving language, C may develop to keep pace with user needs and compiler technology. Bjarne Stroustrup has implemented a new language (or extension of C, depending on your perspective,) that he has cleverly called C++. (6) Although the jury is still out, C++ may well supplant C sometime in the future. The most interesting features of the C++ language are:

• User-definable types with operators that apply. Complex and BCD data types could be created this way.

• Derived classes that allow object-oriented programming, which is a great advantage when doing graphics.

• Data abstraction facilities using classes, which provide data hiding, structure initialization, and dynamic typing (all optional).

• Argument checking and coercion (overridable) for all functions, making it unnecessary for the programmer to cast arguments that are not of the proper type.

• Function and operator overloading, making it unnecessary to have separate functions for floating point and integer arguments.

• A high degree of compatibility with C. The C++ compiler can optionally compile C if required.

Note that all but the first three features are provided in the ANSI draft standard. A big advantage of C++ is that programs can define new data types as required. (7) For commercial data processing, BCD arithmetic can be defined, and for mathematical computing, complex arithmetic can be defined. This is much better than modifying the compiler. Complex arithmetic would be simpler to define than BCD arithmetic. The latter would require overloading C++ operators to make them call functions that would have to be implemented in Assembly code, using machine-dependent BCD instructions.

The principal advantage of C has always been that it doesn’t try to do everything itself — libraries can be written to do whatever is required. The complex and BCD data types are not needed by everybody using the language, so they are most appropriately relegated to a library.

1 D.M. Ritchie, S.C. Johnson, M.E. Lesk, and B.W. Kernighan, “The C Programming Language,” Bell System Technical Journal, vol. 57, no. 6, pp. 1991-2019, 1978.

2 D.M. Ritchie, “The Evolution of the UNIX Time-sharing System,” AT&T Bell Laboratories Technical Journal, vol. 63, no. 8.2, pp. 1577-1594, 1984.

3 B.W. Kernighan and D.M. Ritchie, The C Programming Language, Prentice-Hall, Englewood Cliffs, NJ, 1976.

4 The discussion of the ANSI draft standard is derived mostly from a Usenet article submitted by Henry Spencer of the University of Toronto.

5 L. Rosier, “The Evolution of C — Past and Future,” AT&T Bell Laboratories Technical Journal, vol. 63, no. 8.2, pp. 1685-1700, 1984.

6 B. Stroustrup, “C++ Reference Manual” Computing Science Technical Report, no. 108, AT&T Bell Laboratories, Murray Hill, NJ, January 1984.

7 B. Stroustrup, “Data Abstraction in C,” AT&T Bell Laboratories Technical Journal, vol. 63, no. 8.2, pp. 1701-1732, 1984.

Bill Tuthill was a leading UNIX and C consultant at UC Berkeley for four years prior to becoming a systems software analyst at Imagen Corporation. He enjoys a solid reputation in the UNIX community earned as part of the Berkeley team that enhanced Version 7 (BSD 4A 4.1 and 4.2).

The Evolution of C: Heresy and Prophecy

ANSI STANDARD C

FUTURE DIRECTIONS

Related