5. A Sophisticated Attempt That Fails

We now proceed to a proto-attempt at a self-documenting program which is very close to a solution. This example is correct except for the fine details of the C language.

Making use of the ideas from the Name That Program example, we obtain the following program:

Failure I

    char*f="main(){printf(f);printf(f);}";main(){printf(f);printf(f);}
The output from this program is:
    main(){printf(f);printf(f);}main(){printf(f);printf(f);}
which is very nearly correct. All that is missing is the code that defines the string f. The intuitive way to put the finishing touches on this program is to rewrite it as follows:

Failure II

    char*f="main(){printf("char*f="");printf(f);printf("";\n");printf(f);}";
    main(){printf("char*f="");printf(f);printf("";\n");printf(f);}
Namely, to add the necessary code to print out the text which delimits the mention of the program and reformat the output so it runs over two lines. In theory, there is nothing wrong with this solution. In practice, it fails to even compile! As it turns out, the problem reduces to the way the C language handles the use/mention distinction internally.

In C, certain characters are treated differently depending on whether they are being used (as part of a programming instruction) or merely mentioned (as part of a string literal). Firstly, the double quote character " cannot be directly inserted into a string, because C uses it to delimit strings. There is no easy way of knowing whether a given double quote was meant as a use or a mention. C resolves this conflict by forcing the programmer to state which sense of " is desired. To mention a double quote from within a string, you must escape it by preceding it with a backslash. C understands backslashes as a special means of mentioning characters under any context. (Incidentally, to avoid taking the backslash character out of context when it is being used explicitly, one places a second backslash in front of it). Also, one cannot insert a newline character directly into a string. That, too, must be escaped, and appears in strings as \n.

Based on this discussion, one might think that the next level of refinement would simply be to escape all the double quotes and newlines inside strings. If we were to perform this transformation on Failure II above, we would obtain a new program, Failure III (the code for Failure III can be found in Appendix A). Unfortunately, when Failure III is compiled and run, its output is not itself, but Failure II!

The reason for this is that escaping a character in C introduces no new information in the final program; it is a lexical convenience, a way for the compiler to know how to treat special characters in its input. The backslashes are thrown away internally, so Failure III outputs a copy of itself without escaped characters - namely, Failure II.


[back] [up] [right]