All is not lost, though. Self II can be recoded to eliminate the
dependencies on specific character codes (that solution, called Self III, can
be found in Appendix A).
Once that is out of the way, is there anything left that might be unsatisfying?
One important element remains: the printf function.
Printf is included by the committee responsible for defining the C language
as a part of the standard C libraries.
Note that printf is not part of C itself - C on its own provides no
mechanism for output whatsoever! We are only justified in using printf
(and putchar, for that matter) based on its recognition as a standard C
function.
But what if C also had a standard function called
print_a_prog, which was defined as follows:
The implementation of this function would be hidden away in a library. The
programmer need only declare the function's existence and then use it. In this
case it becomes trivial to write a self-documenting program:
void print_a_prog() {
printf( "extern void print_a_prog();\n" );
printf( "main(){printf_a_prog();}\n" );
}
Cheat I
This program is not very satisfying, because it seems as if we are making use
of a function tailor-made to our needs, one which just happened to be in the
library. Perhaps one should feel compelled by one's conscience to
require that Cheat I also output the definition of
extern void print_a_prog();
main(){print_a_prog();}
print_a_prog. That means that the print_a_prog
function has to be self-documenting, which naturally begs the question of writing
a self-documenting program.
Furthermore, we certainly cannot require that Self II output the
definition of printf - we do not know what it is! And even if the
definition of printf was known to us, it would be a much
more difficult task to include it in the output of our program. So
it looks as if we must be satisfied with allowing standard library functions.
Even in this case, at what level should the program duplicate itself? Over its lifetime, a computer program exists in many different forms. Depending on what form we identify as the program's 'itself', we obtain different solutions to the original problem. For instance, we can identify a computer program with the file that contains its source code. In this case, we obtain the following:
Cheat II
This program literaly opens its source file, reads each character in it, and
echoes that character as output. The self-reference occurs in the
#include <stdio.h>
main(){
int c;
FILE *f;
f = fopen( __FILE__, "r" );
c = fgetc( f );
while( c != EOF ) {
putchar( c );
c = fgetc( f );
}
fclose( f );
}
__FILE__ macro, which gets replaced at compile time by the name
of the file which contains the given program.
This example is unsatisfying because it relies on the existence of its source code in a fixed location. It seems as if a self-documenting program should exist independently of the files that created it. So, intuitively, we are making the identification between a program and itself at the wrong level. Our definition of a self-documenting program should be revised to include the clause that the program be able to run in isolation of its source.
We could take the opposite viewpoint, and identify a program with the pattern of bits that make up its representation in the computer's memory as it runs. A high level description of the resulting program would look like this:
Cheat III
In some sense, this program is indeed self-documenting. But this result, too,
is unsatisfying, because the output is more or less meaningless. The
representation of the program in memory changes from compiler to compiler, from
computer to computer, even between executions! The output from this program breaks
the chain of self-documentation, because it cannot be interpreted in some way as
a self-documenting program, which suggests it is not identical to the program that
generated it. This suggests another addition to the definition of a self-documenting
program. Not only should the program produce itself as output, but the output should
be a self-documenting program!
main(){
Find myself in the computer's memory.
Output all the bits that make me up.
}
The implementation language presents even more difficulties. So far, we have simply been using C for all our attempts at self-documentation, without addressing the possibility of such programs in other languages. The problem of defining self-documenting programs becomes overwhelming when applied to more than one language. For instance, suppose there was a language called CAT which had very simple semantics: given a file containing source code, CAT generates an executable which outputs the source code. In this case, any program will suffice! They all print out their own source. (The name CAT comes from the UNIX command 'cat' which echoes its input). This example suggests that we revise our definition yet again, to specify that the implementation language be sufficiently complex. Of course, 'sufficiently complex' is another definition altogether.
So, after reviewing the different ways a program can cheat the initial definition, a new, stronger, definition emerges: