#!/usr/bin/env python

"""idebugmap.py: Analyze the memory usage of a compiled Inform game, or
compare the memory usage of two compiled games.

- Andrew Plotkin (erkyrath@eblong.com)
- Release 1, Aug 14 2011
- This script is in the public domain.

This is a very crude hack that parses the "gameinfo.dbg" debug file
generated during an Inform compile.

(If you have an I7 project called "Foo.inform", you can find this file
after a build in the location "Foo.inform/Build/gameinfo.dbg". You may
want to turn off the "Clean build files before closing" preference, so
that this file sticks around after you quit I7. If you're using I6,
include the "-k" switch when you compile.)

This script is intended for Z-code compiles. The debug file format uses
16-bit fields in many places, so if you try this on a Glulx compile, the
results are not guaranteed.

Run the script like so:

> python idebugmap.py Foo.inform/Build/gameinfo.dbg
   (bytes) (code section)
       192 abbreviations table
       204 actions table
      2816 adjectives table
     10797 array space
      1042 class numbers
    141940 code area
      1713 common properties
         0 dictionary
       480 global variables
      1560 grammar table
         8 header extension
       666 individual properties
       588 object tree
         2 parsing routines
       126 property defaults
       ??? strings area

This is the output for a one-line game ("The Kitchen is a room") compiled
with I7 release 6G60. It simply lists the size (in bytes) of each section
of the compiled game file.

(The dictionary appears zero-size because debug file lists "dictionary"
and "adjectives table" as starting in the same place. I don't know why.
Just pretend that "adjectives table" means "dictionary.)

(The strings area is "???" because that information isn't directly available
from the "gameinfo.dbg" file. You can probably estimate how much text you're
using, anyhow. This tool is about digging into the nasty I6 internals, not
your word count.)

(While we're on the subject: remember that "strings area" and "code area"
do *not* count against the Z-code limit of 64k RAM.)

This stack of numbers isn't very enlightening. However, you can also
*compare* two debug files. Say we want to know the cost of a relation.
I'll create a second I7 project, and add the line "Fooness relates things
to things."

> python idebugmap.py Foo.inform/Build/gameinfo.dbg Bar.inform/Build/gameinfo.dbg
  (+bytes) (code section)
        -1 adjectives table
       +28 array space
        +4 class numbers
      +580 code area
       +65 individual properties
       ??? strings area

As you see, this relation costs 28 bytes of array space, 65 bytes of
property space, and 560 bytes of new compiled functions. Plus assorted
cruft.

(The adjectives table, I mean dictionary, appears to have lost a byte --
that's essentially rounding error, ignore it. And "strings area" will
always be listed as "???" because we don't know whether it grew or
shrank. Again, you can probably work that out yourself.)

You don't have to work with two separate projects; you can just copy the
original gameinfo.dbg file somewhere and keep comparing with it:

> cp Foo.inform/Build/gameinfo.dbg gameinfo.dbg
> python idebugmap.py gameinfo.dbg Foo.inform/Build/gameinfo.dbg

You may wonder, what's the cost of an indexed text? We can now find out,
by adding "Zorg is an indexed text that varies."

  (+bytes) (code section)
        +1 adjectives table
    +11432 array space
       +22 class numbers
    +38848 code area
       +65 individual properties
       ??? strings area

Ka-ching! The compiler reserves more than 11k of array space for indexed
text workspace, and the internal functions to work with it cost almost
39k. (Of course that's a one-time hit. If you use dynamic lists you'll see
the same hit; but dynamic lists and indexed text *together* are no worse,
because they share the same workspace and internal function library.)

If you want more information, you can add the "-a" option to list
the size changes of individual arrays.

> python idebugmap.py -a gameinfo.dbg Foo.inform/Build/gameinfo.dbg
  (+bytes) (code section)
        +1 adjectives table
    +11432 array space
       +22 class numbers
    +38848 code area
       +65 individual properties
       ??? strings area

  (+bytes) (array)
       +20 Allocated_Match_Vars [NEW]
     +8208 Blk_Heap [NEW]
      +148 CharCasingChart# [NEW]
        +2 Global_Vars
     +2054 IT_MemoryBuffer [NEW]
      +896 RE_PACKET_space [NEW]
       +22 RE_Subexpressions [NEW]
       +20 Rel_Record_#
       ??? ResourceIDsOfSounds
       +14 SAT_Tmp [NEW]
       +40 Subexp_Posns [NEW]
        +6 property_metadata
        +2 valued_property_offsets

This tells us, for example, that a new array called Allocated_Match_Vars
has appeared, which is 20 bytes long; and the array called Global_Vars
has grown by 2 bytes.

(For various stupid reasons, I'm grouping arrays together if their names
differ only by digits. So "CharCasingChart#" actually represents two
arrays, "CharCasingChart0" and "CharCasingChart1", whose sizes are 80
and 68 bytes respectively. They total 148, which is all you really
need to know.)

(ResourceIDsOfSounds is another one we don't know the size of. This
is because it's last in the file, and I'm computing all of these
by subtracting an array address from the following address.
ResourceIDsOfSounds is tiny -- two or four bytes per sound resource --
so you can ignore it.)

"""

import sys
import optparse
import re
from struct import unpack

class InformFunc:
    def __init__(self, funcnum):
        self.funcnum = funcnum
        self.name = '<???>'
        self.addr = 0
        self.linenum = None
        self.endaddr = None
        self.endlinenum = None
        self.locals = None
        self.seqpts = None
    def __repr__(self):
        return '<InformFunc $%s %s>' % (hex(self.addr)[2:], repr(self.name))

class InformArray:
    def __init__(self, addr, name='<???>', bytelen=None):
        self.addr = addr
        self.name = name
        self.bytelen = bytelen
    def __repr__(self):
        return '<InformArray $%s %s %s>' % (hex(self.addr)[2:], self.bytelen, repr(self.name))
    
class InformMapEntry:
    def __init__(self, name, addr, len=None):
        self.name = name
        self.addr = addr
        self.bytelen = None
    def __repr__(self):
        return '<InformMapEntry %s $%s %s>' % (repr(self.name), hex(self.addr)[2:], self.bytelen)
    
class DebugFile:
    def __init__(self, fl):
        self.files = {}
        self.functions = {}
        self.function_names = {}
        self.classes = []
        self.objects = {}
        self.arrays = {}
        self.globals = {}
        self.properties = {}
        self.attributes = {}
        self.actions = {}
        self.fake_actions = {}
        self.map = {}
        self.header = None
        
        dat = fl.read(2)
        val = unpack('>H', dat)[0]
        if (val != 0xDEBF):
            raise ValueError('not an Inform debug file')
            
        dat = fl.read(2)
        self.debugversion = unpack('>H', dat)[0]
        dat = fl.read(2)
        self.informversion = unpack('>H', dat)[0]

        rectable = {
            1:  self.read_file_rec,
            2:  self.read_class_rec,
            3:  self.read_object_rec,
            4:  self.read_global_rec,
            5:  self.read_attr_rec,
            6:  self.read_prop_rec,
            7:  self.read_fake_action_rec,
            8:  self.read_action_rec,
            9:  self.read_header_rec,
            10: self.read_lineref_rec,
            11: self.read_routine_rec,
            12: self.read_array_rec,
            13: self.read_map_rec,
            14: self.read_routine_end_rec,
        }

        while True:
            dat = fl.read(1)
            rectype = unpack('>B', dat)[0]
            if (rectype == 0):
                break
            recfunc = rectable.get(rectype)
            if (not recfunc):
                raise ValueError('unknown debug record type: %d' % (rectype,))
            recfunc(fl)

        for func in self.functions.values():
            self.function_names[func.name] = func

        ls = sorted(self.arrays.keys())
        for ix in range(len(ls)):
            addr = ls[ix]
            arr = self.arrays[addr]
            if (ix+1 < len(ls)):
                arr.bytelen = ls[ix+1] - arr.addr

    def read_file_rec(self, fl):
        dat = fl.read(1)
        filenum = unpack('>B', dat)[0]
        includename = self.read_string(fl)
        realname = self.read_string(fl)
        self.files[filenum] = ( includename, realname )
        
    def read_class_rec(self, fl):
        name = self.read_string(fl)
        start = self.read_linenum(fl)
        end = self.read_linenum(fl)
        self.classes.append( (name, start, end) )
        
    def read_object_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        start = self.read_linenum(fl)
        end = self.read_linenum(fl)
        self.objects[num] = (name, start, end)
    
    def read_global_rec(self, fl):
        dat = fl.read(1)
        num = unpack('>B', dat)[0]
        name = self.read_string(fl)
        self.globals[num] = name
    
    def read_array_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        self.arrays[num] = InformArray(num, name)
    
    def read_attr_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        self.attributes[num] = name
    
    def read_prop_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        self.properties[num] = name
    
    def read_action_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        self.actions[num] = name
    
    def read_fake_action_rec(self, fl):
        dat = fl.read(2)
        num = unpack('>H', dat)[0]
        name = self.read_string(fl)
        self.fake_actions[num] = name
    
    def read_routine_rec(self, fl):
        dat = fl.read(2)
        funcnum = unpack('>H', dat)[0]
        func = self.get_function(funcnum)
        
        func.linenum = self.read_linenum(fl)
        dat = fl.read(3)
        addr = unpack('>I', '\0'+dat)[0]
        func.addr = int(addr)
        func.name = self.read_string(fl)
        locals = []
        while True:
            val = self.read_string(fl)
            if (not val):
                break
            locals.append(val)
        func.locals = locals

    def read_lineref_rec(self, fl):
        dat = fl.read(2)
        funcnum = unpack('>H', dat)[0]
        func = self.get_function(funcnum)

        if (not func.seqpts):
            func.seqpts = []
        
        dat = fl.read(2)
        count = unpack('>H', dat)[0]
        for ix in range(count):
            linenum = self.read_linenum(fl)
            dat = fl.read(2)
            addr = unpack('>H', dat)[0]
            func.seqpts.append( (linenum, addr) )
        
    def read_routine_end_rec(self, fl):
        dat = fl.read(2)
        funcnum = unpack('>H', dat)[0]
        func = self.get_function(funcnum)

        func.endlinenum = self.read_linenum(fl)
        dat = fl.read(3)
        addr = unpack('>I', '\0'+dat)[0]
        func.endaddr = int(addr)

    def read_header_rec(self, fl):
        dat = fl.read(64)
        self.header = dat
    
    def read_map_rec(self, fl):
        while True:
            name = self.read_string(fl)
            if (not name):
                break
            dat = fl.read(3)
            addr = unpack('>I', '\0'+dat)[0]
            addr = int(addr)
            self.map[name] = InformMapEntry(name, addr)

        ls = [ val for val in self.map.values() ]
        ls.sort(key=lambda ent:ent.addr)
        for (name, addr) in self.map.items():
            ix = ls.index(addr)
            if (ix+1 < len(ls)):
                val = ls[ix+1].addr - ls[ix].addr
                self.map[name].bytelen = val
    
    def read_linenum(self, fl):
        dat = fl.read(4)
        (funcnum, linenum, charnum) = unpack('>BHB', dat)
        return (funcnum, linenum, charnum)

    def read_string(self, fl):
        val = ''
        while True:
            dat = fl.read(1)
            if (dat == '\0'):
                return val
            val += dat

    def get_function(self, funcnum):
        func = self.functions.get(funcnum)
        if (not func):
            func = InformFunc(funcnum)
            self.functions[funcnum] = func
        return func

def print_diffs(map1, map2):
    map = {}
    for name in map1:
        map[name] = True
    for name in map2:
        map[name] = True
    names = sorted(map.keys())
    for name in names:
        ent1 = map1.get(name)
        ent2 = map2.get(name)
        if (ent1 is None):
            if (ent2.bytelen is None):
                print '  %8s %s [NEW]' % ('???', name)
            else:
                print '  %8s %s [NEW]' % ('+'+str(ent2.bytelen), name)
            continue
        if (ent2 is None):
            if (ent1.bytelen is None):
                print '  %8s %s [DEL]' % ('???', name)
            else:
                print '  %8s %s [DEL]' % ('-'+str(ent1.bytelen), name)
            continue
        if (ent1.bytelen is None) or (ent2.bytelen is None):
            print '  %8s %s' % ('???', name)
            continue
        diff = ent2.bytelen - ent1.bytelen
        if (diff == 0):
            continue
        if (diff < 0):
            print '  %8s %s' % ('-'+str(-diff), name)
        else:
            print '  %8s %s' % ('+'+str(diff), name)


usage = 'usage: idebugmap.py [-a] gameinfo.dbg [gameinfo2.dbg]'

popt = optparse.OptionParser(usage=usage)

popt.add_option('-a', '--arrays',
    action='store_true', dest='arrays',
    help='list the size of each array')

(opts, args) = popt.parse_args()

if len(args) < 1 or len(args) > 2:
    print usage
    sys.exit(1)

if len(args) == 1:
    fl = open(args[0])
    dat = DebugFile(fl)
    fl.close()

    print '  %8s %s' % ('(bytes)', '(code section)')
    for name in sorted(dat.map.keys()):
        ent = dat.map[name]
        val = ent.bytelen
        if (val is None):
            val = '???'
        print '  %8s %s' % (val, name)
        
    if (opts.arrays):
        print
        print '  %8s %s' % ('(bytes)', '(array)')
        for addr in sorted(dat.arrays.keys()):
            arr = dat.arrays[addr]
            val = arr.bytelen
            if (val is None):
                val = '???'
            print '  %8s %s' % (val, arr.name)
                
else:
    fl = open(args[0])
    dat1 = DebugFile(fl)
    fl.close()
    fl = open(args[1])
    dat2 = DebugFile(fl)
    fl.close()

    print '  %8s %s' % ('(+bytes)', '(code section)')
    print_diffs(dat1.map, dat2.map)

    if (opts.arrays):
        print
        print '  %8s %s' % ('(+bytes)', '(array)')

        digitpat = re.compile('[0-9]+')
        map1 = {}
        for arr in dat1.arrays.values():
            name = digitpat.sub('#', arr.name)
            if (map1.has_key(name)):
                oldarr = map1[name]
                arr = InformArray(-1, name, arr.bytelen+oldarr.bytelen)
            map1[name] = arr
        map2 = {}
        for arr in dat2.arrays.values():
            name = digitpat.sub('#', arr.name)
            if (map2.has_key(name)):
                oldarr = map2[name]
                arr = InformArray(-1, name, arr.bytelen+oldarr.bytelen)
            map2[name] = arr
        print_diffs(map1, map2)
        
