The Vile Zero Errors from Hell

(Don't blame me for that title. David Glasser came up with the phrase.)

We shall discuss a software hazard which occurs in many Inform games.


The Nature of the Hazard

The Z-machine has several opcodes which deal with objects. For example, @get_parent finds the parent of a given object (the object which contains it.) @get_prop looks up a given (common) property in a given object. There are others.

Each of these opcodes operates on an object reference. In the Z-machine, an object reference is an integer; objects are numbered consecutively starting with 1. The opcode takes the integer and uses it as an index into the object table.

This leaves open the question: what is object zero? In the Z-machine, there is none. It is used as a NULL pointer, a reference meaning "no object". For example, the @get_parent opcode returns zero if it determines that an object is not contained in anything.

Very well; but this leaves open the question, what happens if you give a zero reference to the @get_parent opcode? What is the parent of "no object"?

The short answer is: asking that question is a software error. The answer is not predictable. If your program does it, your program contains a bug.

How To Commit This Error

In Inform code, the most obvious way to cause a zero-opcode error is this:
val = parent(0);
Of course, you're not going to write that line of code. It's much more likely to occur this way:
val = parent(obj);
...if the value of obj happens to be zero.

Here are other Inform code samples that can cause opcode errors:

val = parent(obj);
val = child(obj);
val = sibling(obj);
give obj attribute;
if (obj has attribute) ...
val = obj.property;
val = obj.&property;
val = obj.#property;
val = obj.property();
obj.property = val;
move obj to obj2;
remove obj;
if (obj in obj2) ...
All of these are illegal if obj (or obj2) is equal to zero. (Or nothing, which is an Inform alias for zero.)

How To Not Commit This Error

Don't do those things when it's illegal.

In general -- take care. If a variable or property might be zero, test that case before you do any of the above.

Remember that parent(obj) or child(obj) might be zero, even when obj is a valid object. You can't do this:

give parent(obj) general;
...unless you know that obj is nonzero and obj is contained in something else.

If you have a grammar line

Verb 'wob'
    * noun -> Wob;
...then whenever you're in WobSub, or any before/after clause for Wob, second will be zero. (But noun will be a valid object.) If you have

Verb 'wob'
    * noun -> Wob
    * noun 'on' noun -> Wob;
...then second might be zero, and you must test for that possibility.

Similarly, if you call <<Wob thing>> then second will be zero, even if the grammar lists two nouns.

The Library Bug

A couple of these bugs have crept into the standard Inform library, up until library version 6/7. (They are being fixed in library 6/8).

The most notorious is in the following code. (From Parserm, library version 6/7; I don't know how early the bug was introduced.)

[ HasLightSource i j ad;
    if (i==0) rfalse;
    if (i has light) rtrue;
    if (i has enterable || IsSeeThrough(i)==1)
    {    objectloop (i in i)
            if (HasLightSource(i)==1) rtrue;
    }
    ad = i.&add_to_scope;
    ! ...function continues...
];
The Inform statement
    objectloop (i in i)
is legal Inform code, but it's not what the library wants to do. It loops i through the contents of what i originally pointed to, leaving i equal to zero at the end of the loop. Then the statement
    ad = i.&add_to_scope;
is an opcode error.

That statement, and the rest of the function, assumes that i remains unchanged after the loop. So the fix is to change those lines to

    {    objectloop (j in i)
            if (HasLightSource(j)==1) rtrue;
    }
If you are an Inform developer, you should make this change in your 6/7 libraries immediately. It also applies to earlier library versions, although I don't know how early. It may go back as far as the Inform 5 libraries (possibly in a different form.)

This bug is triggered when the player is in a container or supporter.

Another library 6/7 bug occurs if you type "say to me". I do not have a patch for this one at this time.

Recent Inform Changes

Graham Nelson is currently testing Inform 6.20 and Inform library 6/8. They are available for alpha-testing. (At the IF Archive.)

Library 6/8 fixes all the known opcode zero errors. Inform 6.20 provides a "-S" switch, which compiles in lots of strict error-checking. A program compiled with "-S" will print a warning when it encounters an opcode error, no matter what interpreter you run it on. It will also catch various other mistakes such as broken objectloops, overrun array bounds, and so on.

This all makes the problem of writing correct code a lot easier.

Are These Really Errors?

The obvious question is, if these are errors, how come they don't usually cause problems?

Let us consider what happens when ZIP, the original unmodified freeware Z-machine interpreter, executes @get_parent 0.

The intepreter core calls the following function. (I'm simplifying the code a bit, but this is what it does in V5 games.)

void load_parent_object(zword_t obj)
{
    zword_t address = get_object_address(obj);
    zword_t parent = read_object(address, PARENT);
    store_operand(parent);
}
This is straightforward. First it finds the Z-machine address of the given object's entry in the object table. Then it looks up the appropriate field in that entry. Then it puts that field value in the usual Z-machine place for returning a value to the caller.

When the game executes @get_parent 0, this function is called with obj equal to zero. So it calls get_object_address(0).

zword_t get_object_address(zword_t obj)
{
    zword_t offset;
    offset = h_objects_offset + ((MAX_PROPERTIES - 1) * 2) 
        + ((obj - 1) * SIZE);
    return offset;
}
h_objects_offset is the Z-machine address of the "object table". Actually, this table starts with all the default values of the common properties. That's the first 63 words of the table -- property defaults 1 through 63. Then comes object 1, then object 2, and so on. In each object entry, there are three words of attributes, then the parent, sibling, and child values, then the pointer to the property table.

As you see, the offset calculation first skips the 63 property default words (each of which is two bytes long; that's the "* 2".) Then it multiplies (obj-1) by SIZE, which is the size of an object entry, and adds that in.

Obviously, when obj is zero, this returns an address before the first object entry. In fact, this address is inside the property-default table. SIZE is 14 for V5 games, so it's the address of property value 57. And that's what gets returned.

Back in load_parent_object(), the next call is read_object(address, PARENT). This just pulls out the fourth word from the given address. (Three words of attributes and then the parent, remember.) So what it's actually found is the default for property 60.

Now, if you don't have 60 common properties in your game, this will be zero. If you do, you'll get a surprise.

If you try to move object zero, the interpreter will try to change that field; so the default for property 60 will change. Surprise.

The standard Frotz code (Frotz version 2.32) behaves essentially the same way.

Why Don't These Interpreters Check?

It seems like sloppy programming. However -- recall history. ZIP was written before Inform. It was meant to play Infocom game files; and Infocom games don't contain opcodes like @get_parent 0. (See below.)

When the task is executing game files that are known to be correct, error-checking is a luxury. Since programmers are lazy, it tends not to happen.

When Inform came into common use, authors started writing games and testing them with ZIP. Since programmers are lazy, opcode errors began slipping in. Since these errors very often have no visible effect -- @get_parent 0 usually returns 0, which is what you might expect -- they were not corrected.

This got particularly bad, of course, when that library bug crept in. At this point, almost all Inform 6 games, and many Inform 5 games, contain opcode errors.

Why Don't We Declare This Legal?

We could decide that @get_parent 0 should be 0. And similar decisions for the other object opcodes.

But that presents many problems. Many interpreters would have to be updated; and every interpreter not updated would necessarily be declared wrong. Not every piece of code is supported. For example, Infocom's interpreters don't check for these errors, and they'll certainly never be updated.

By declaring these errors legal, we would be deciding that all future games would not run reliably on Infocom's interpreters, and other old interpreters.

We also note that Infocom game files are almost completely free of these errors. The rare exceptions seem to be bugs. This is very strong evidence that Infocom thought of such statements as errors, and took care to avoid them. I would lay money that they used debugging interpreters in-house, which flagged them as errors.

(For an example of such an Infocom bug, type "go me" in Zork 1. This seems to be sporadic, but it happens if you try it on the first move.)

By any standard, declaring these statements legal would be an extension of the Z-machine -- with all the organizational and practical problems that entails.

So What Now?

Since changing the interpreters is hard, we must change the games. This seems counterintuitive, since there are more games than interpreters. But there are a fixed number of games already in existence. If we avoid writing buggy games in the future (which will be much easier with Inform 6.20), then that leaves a fixed set of old games which are unsafe.

Furthermore, we can fix interpreters to play unsafe games -- as long as we don't legimitize that unsafeness.

After pacing back and forth a lot, I decided on the following course for my own interpreters, MaxZip and XZip:

There is a user preference for error-checking. This can be set to four levels:

Note that none of these are how unmodified ZIP or Frotz behaves.

The default error-checking level is 1. Since errors only appear once per play session, they should not be intrusive; but neither should they fall out of sight and mind.

The player can set any error-checking level he wants, of course. I recommend 0 only when playing games that are no longer supported. If the author is accepting bug reports -- hopefully, most authors do -- then level 1 is reasonable. If you are developing or beta-testing a game, level 2 is absolutely mandatory.

The following patch will apply the same changes to the original ZIP source. (The command-line switch is "-s LEVEL".)

http://www.ifarchive.org/if-archive/infocom/interpreters/zip/zip_zstrict_patch

The following patch will apply the same changes to the Frotz source. Except that the command-line switch is "-Z LEVEL", since Frotz already uses the -s switch to set the random number seed value. (Thanks to Torbjörn Andersson.)

http://www.ifarchive.org/if-archive/infocom/interpreters/frotz/frotz_zstrict_patch


Last updated December 16, 1998.

Zarfhome