In COMP1521, we’re currently boldly going forth and writing MIPS-flavoured assembly. Unfortunately, there have been some serious style sins committed, so here’s my hot tips on writing good assembly.

Mechanical style

Most of these rules set out to improve whitespace and consistency. Don’t deliberately write dense, cryptic code; assembly is hard enough to read as is.

RULE Set your tab width out to 8, and don’t insert spaces. At 8, 16, and 32 columns, or as close after as possible, place mnemonic, operands, and a line comment, respectively.

This is a controversial one, because of the ever-popular tabs-vs-spaces debate. In this case, I like wide indentation to make patterns in the flow of data more apparent. If you’re abhorrent to such wide indentation, that’s OK: jas uses 3 column indentation; another reasonable value might be 4. However: pick something sensible and stick to it.

RULE Labels are never indented. Instructions are always indented.

# BAD:
f:
bgt $a0, $0, f_a0_false
addi $v0, $a0, $a1

# ALSO BAD:
        f:
bgt $a0, $0, f_a0_false
addi $v0, $a0, $a1

# ALSO BAD:
        f:
        bgt $a0, $0, f_a0_false
        addi $v0, $a0, $a1

# GOOD:
f:
        bgt     $a0, $0, f_a0_false
        addi    $v0, $a0, $a1

RULE Don’t indent to show structure. Indent to the same level, and use comments or label names to indicate structure.

# DISGUSTINGLY BAD:
f:
bgt $0, $a0, f_a0_false
    f_a0_true:
    bgt $0, $a1, f_a1_false
        f_a1_true:
            add $v0, $a0, $a1
    f_a1_false:
    f_a0_false:
li $v0, 0

# GOOD:
f:
        bgt     $0, $a0, f_a0_false
f_a0_true:
        bgt     $0, $a1, f_a1_false
f_a1_true:
        add     $v0, $a0, $a1
f_a1_false:
f_a0_false:
        li $v0, 0
# (better: add vertical whitespace before non-empty labels)

RULE Add whitespace between the mnemonic and arguments.

# BAD:
f:
        bgt $a0, $0, f_no
        li $t0, 4
        j f_yes

# GOOD:
f:
        bgt     $a0, $0, f_no
        li      $t0, 4
        j       f_yes

Naming rules

RULE Give labels clear, systematic names.

Some suggestions for a systematic naming scheme follow; if you like them, use them, and use them consistently.

RULE Preface all labels with the function or scope they belong to.

Because there’s no scope bounding the names you can refer to, you need to uniquely name everything, including labels. Given a function f, it would be reasonable to prefix all relevant labels in it with, for example, f_.

RULE Give function epilogues (and, where necessary, prologues), dedicated labels.

It’s also useful to denote “special” labels, like the label for the prologue and epilogue (or prelude and postlude, depending on what you call the sections that set up and tear down stack frames) To avoid confusion, use two underscores to separate the function name from the special label type; for example, f__epi or f__post might mark the epilogue to f. It’s uncommon to need a specialised name for the prologue, so if you do need it, make it clear what magic and/or evil you’re doing.

RULE In a conditional, label all parts of that conditional, to make it clear how execution has reached here.

I like to use the scheme function_variable[_condition]. So, for example, the label f_n_lt_0 gives us “in function f, for variable n, n < 0 was true”. A special case is the _phi extension: control flow continues from this point from all arms of the conditional; the name phi is borrowed from SSA form. You may like to come up with your own scheme; but whatever you choose, stick to it.

For example,

void f (int n) {
    if (n < 0) {
        putchar ('-');
    } else if (n 0) {
        putchar ('+');
    }
}

might give these labels:

f:
f_n_lt_0:
f_n_lt_0_f:
f_n_gt_0:
f_n_gt_0_f:
f_n_phi:
f__epi:

RULE In a looping construct, label all parts of that loop.

Following the above naming scheme, I like to use the suffixes init, cond, step, and f (or false) to represent the loop initialisation, loop condition, increment of the loop, and the point where control flow resumes when the condition is false. The step suffix should come directly before the instruction(s) that increment i. This allows us to build a continue analogue. (This isn’t necessary in a while loop.)

For example,

void f (int n) {
    for (int i = 0; i < n; i++) {
        // ...
    }
}

might give us these labels:

f:
f_i_init:
f_i_cond:
f_i_step:
f_i_false:
f__epi:

Commenting

To comment a function called main, I’d suggest following a template like this:

########################################################################
# .TEXT <main>
        .text
main:

# Frame:        $fp, $ra, $s0, $s1, $s2, $s3, $s4
# Uses:         $a0, $a1, $v0, $s0, $s1, $s2, $s3, $s4
# Clobbers:     $a0, $a1

# Locals:
#       - `argc' in $s0
#       - `argv' in $s1
#       - `length' in $s2
#       - `ntimes' in $s3
#       - `i' in $s4

# Structure:
#       main
#       -> [prologue]
#       -> main_seed
#         -> main_seed_t
#         -> main_seed_end
#       -> main_seed_phi
#       -> main_i_init
#       -> main_i_cond
#          -> main_i_step
#       -> main_i_end
#       -> [epilogue]

# Code:
        # set up stack frame
        # ...
        # tear down stack frame

Listing the frame makes it easier to determine what is at what offset above $fp; listing what registers this function uses and clobbers makes it easier to determine what registers one should save. It’s also very useful to list local variables, either those stored in registers or on the stack. A graph of the control flow is also useful so you can easily identify what label results from what piece of the control flow.

RULE Write clear, useful, meaningful comments, that make it clear to the reader what your code is doing, and why.

# Given $s0 is `row' and `t3' is NCOLS:
# BAD:
    mul $t0, $s0, $t3 #confused.........
    add $t0, $t0, $s1
    sb $t2, grid($t0) #how to get grid[row][col] ='.'

# GOOD:
    mul     $t0, $s0, $t3     # (row * NCOLS
    add     $t0, $t0, $t1     #  ... + col
    sb      $t2, grid($t0)    #  ... + &grid[0][0]) <- '.'

# GOOD:
    mul     $t0, $s0, $t3     # t0 = row * NCOLS
    add     $t0, $t0, $t1     # t0 = (row * NCOLS) + col
    sb      $t2, grid($t0)    # *(grid + (row*NCOLS) + col) = '.'

Structured data

RULE When using structured data, always get a base pointer and use fixed offsets.

For example,

struct student {
    int zid;
    char *name;
    double wam;
    int program;
} s;

would be laid out with zid at offset 0, name at offset 4, wam at offset 8, and program at offset 16.

# with a base pointer to a `struct student` in $a0:
student_get_zid:
        lw      $v0, 0($a0)
student_get_name:
        lw      $v0, 4($a0)
student_get_wam:
        lw      $t0, 8($a0)
        mthc1   $t0, $f0
        lw      $t0, 12($a0)
        mtc1    $t0, $f1
student_get_program:
        lw      $v0, 16($a0)

This makes it much easier to use struct student and struct student *, as both are now effectively identical.

Allocating registers

One really useful trick: when writing a function, and especially when translating a function from another language, don’t work out what variables are in what registers (“register allocation”). Instead, use percent-prefixed placeholders, then do a search-and-replace for those placeholders with the register you decide to use.

Given:

void f (int matrix[NROWS][NCOLS]) {
    for (int row = 0; row < NROWS; row++) {
        for (int col = 0; col < NCOLS; col++) {
            matrix[row][col] = 0;
        }
    }
}

It’s much easier to make a first-pass translation referring to those values, to get the logic right.

f:
        # ... preamble elided ...
        li      %NROWS, 4
        li      %NCOLS, 4

f_row_init:
        # int row = 0;
        li      %row, 0
f_row_cond:
        # row < NROWS ? 1 : 0
        slt     $at, %row, %NROWS
        beq     $at, $0, f_row_false

f_col_init:
        # int col = 0;
        li      %col, 0
f_col_cond:
        # col < NCOLS ? 1 : 0
        slt     $at, %col, %NCOLS
        beq     $at, $0, f_col_false

        mul     %tmp, %row, %NCOLS  # row * NCOLS
        addu    %tmp, %tmp, %col    # (row * NCOLS) + col
        li      %tmp2, 4
        mul     %tmp, %tmp, %tmp2   # 4 * ((row * NCOLS) + col)
        addu    %tmp, %matrix, %tmp # matrix + row*NCOLS + col
        sw      $0, (%tmp)          # *(matrix + row*NCOLS + col) = 0

f_col_step:
        addi    %col, %col, 1
        j       f_col_cond

f_col_false:
f_row_step:
        addi    %row, %row, 1
        j       f_row_cond

f_row_false:
f__post:
        # ... postamble elided ...
        jr      $ra

Now I might like to replace %matrix with $a0, %row with $s0, %col with $s1, %NROWS with $t0, %NCOLS with $t1, %tmp with $t2, and %tmp2 with $t3, using some sort of string replacement in my text editor.

Some assemblers (not SPIM, unfortunately) support defining macros either using special syntax or using the C preprocessor. This also provides a useful technique for allocating registers in a region.