7. Self-Documenting Programs That Cheat

Hi! A lot of people seem to locate this page by doing a Web search on "cheat programs". If you're one of those people, welcome. I don't think this is the kind of information you had in mind. Nevertheless, I welcome you to check out my home page.

Standards and Libraries

Based on the previous section, it would seem that the problem has been solved. We now have a program which is truly self-documenting. It clearly satisfies the definition of a self-documenting program given in section 2. But is Self II a satisfying solution, definition aside? For one thing, it relies quite heavily on ASCII to map 34 and 10 to double quote and newline, respectively. ASCII is not part of the C language per se. The mapping of codes to characters is specific to each machine on which the program is executed. Although ASCII is an accepted worldwide standard, it is not the only mapping that exists. Some computers use another code, called EBCDIC. Still others are adopting newer standards like Unicode or JIS which allow for characters which do not exist in ASCII, like kanji ideograms and hiragana characters. Of course, each of these standards provides some code for double quote and another for newline, but the program will have to be recoded each time. So Self II is not truly universal.

All is not lost, though. Self II can be recoded to eliminate the dependencies on specific character codes (that solution, called Self III, can be found in Appendix A). Once that is out of the way, is there anything left that might be unsatisfying? One important element remains: the printf function.

Printf is included by the committee responsible for defining the C language as a part of the standard C libraries. Note that printf is not part of C itself - C on its own provides no mechanism for output whatsoever! We are only justified in using printf (and putchar, for that matter) based on its recognition as a standard C function.

But what if C also had a standard function called print_a_prog, which was defined as follows:

    void print_a_prog() {
        printf( "extern void print_a_prog();\n" );
        printf( "main(){printf_a_prog();}\n" );
The implementation of this function would be hidden away in a library. The programmer need only declare the function's existence and then use it. In this case it becomes trivial to write a self-documenting program:

Cheat I

    extern void print_a_prog();
This program is not very satisfying, because it seems as if we are making use of a function tailor-made to our needs, one which just happened to be in the library. Perhaps one should feel compelled by one's conscience to require that Cheat I also output the definition of print_a_prog. That means that the print_a_prog function has to be self-documenting, which naturally begs the question of writing a self-documenting program. Furthermore, we certainly cannot require that Self II output the definition of printf - we do not know what it is! And even if the definition of printf was known to us, it would be a much more difficult task to include it in the output of our program. So it looks as if we must be satisfied with allowing standard library functions.

What is 'itself' to a program?

Cheat I implicitly raises another important question concerning self-documenting programs. Our definition states that such a program must be able to produce itself as output. But what exactly is meant by 'itself'? Obviously 'itself' is not meant in the same sense as it would were it applied to something like a chair. The only thing that can fill the role of 'itself' for a given chair is that chair and that chair alone. A program can never hope to truly produce 'itself' as output, and must instead settle on an exact duplicate of itself.

Even in this case, at what level should the program duplicate itself? Over its lifetime, a computer program exists in many different forms. Depending on what form we identify as the program's 'itself', we obtain different solutions to the original problem. For instance, we can identify a computer program with the file that contains its source code. In this case, we obtain the following:

Cheat II

    #include <stdio.h>
        int c;
        FILE *f;
        f = fopen( __FILE__, "r" );
        c = fgetc( f );
        while( c != EOF ) {
            putchar( c );
            c = fgetc( f );
        fclose( f );
This program literaly opens its source file, reads each character in it, and echoes that character as output. The self-reference occurs in the __FILE__ macro, which gets replaced at compile time by the name of the file which contains the given program.

This example is unsatisfying because it relies on the existence of its source code in a fixed location. It seems as if a self-documenting program should exist independently of the files that created it. So, intuitively, we are making the identification between a program and itself at the wrong level. Our definition of a self-documenting program should be revised to include the clause that the program be able to run in isolation of its source.

We could take the opposite viewpoint, and identify a program with the pattern of bits that make up its representation in the computer's memory as it runs. A high level description of the resulting program would look like this:

Cheat III

        Find myself in the computer's memory.
        Output all the bits that make me up.
In some sense, this program is indeed self-documenting. But this result, too, is unsatisfying, because the output is more or less meaningless. The representation of the program in memory changes from compiler to compiler, from computer to computer, even between executions! The output from this program breaks the chain of self-documentation, because it cannot be interpreted in some way as a self-documenting program, which suggests it is not identical to the program that generated it. This suggests another addition to the definition of a self-documenting program. Not only should the program produce itself as output, but the output should be a self-documenting program!

The implementation language presents even more difficulties. So far, we have simply been using C for all our attempts at self-documentation, without addressing the possibility of such programs in other languages. The problem of defining self-documenting programs becomes overwhelming when applied to more than one language. For instance, suppose there was a language called CAT which had very simple semantics: given a file containing source code, CAT generates an executable which outputs the source code. In this case, any program will suffice! They all print out their own source. (The name CAT comes from the UNIX command 'cat' which echoes its input). This example suggests that we revise our definition yet again, to specify that the implementation language be sufficiently complex. Of course, 'sufficiently complex' is another definition altogether.

So, after reviewing the different ways a program can cheat the initial definition, a new, stronger, definition emerges:

A self-documenting program is a program written in a sufficiently complex language which, when run in isolation from its source, produces output which can be identified with itself, and which is also a self-documenting program.

[back] [up] [right]