Assembly language missing features

There are a few things that an assembly language or its assembler should support, but we do not have for Dauug|36 as yet.

Object code management

The cross assembler outputs directly into a list of 36-bit words representing the contents of code RAM, so all we have is a single text segment (code segment). We do not have:

data segments
read-only data segments
uninitialized static data segments
relocation information
linking information
any kind of program header

There is no librarian or linker.

Source code management

There is no method to assemble more than one file into a program. For the cross assembler, a makefile or script could catenate source files prior to assembler—but I have not done this to date.

Register conservation

There is no call tree analysis for scopes, register liveness analysis, graph coloring, manual register coalescing, or other means to reduce the number of registers required by a program.

There is no mechanism for defining symbolic constants in the assembler without allocating a register for each constant. This omission prevents a library from defining a bunch of constants (as in errno.h on POSIX systems) that are available to the developer on an as-needed basis. Instead, every constant featured in the library will diminish by one the number of available registers, whether or not the constant is used in a program.

There is only one kind of constant supported, namely, a register-allocated constant that lives for the entirety of a program. There is no language provision to mark a constant as “generate at each time of use” or “generate when in scope.” These omissions cause more registers to be required than necessary for many programs.

Record access

There is no convenient mechanism for pointer offsets for managing records. This omission forces words within multiple-word data structures to be accessed by numeric offset instead of some kind of symbolic name. So where we may wish to write

component = color -> blue

we have to write something like

component = color addld 2

instead, or possibly even worse,

blue.offset = 2
; ...
component = color addld blue.offset

Memory initialization

We have no semantics, syntax, or support to start a program with any kind of data already in memory, whether it’s a string constant, table, list, tree, or whatever. This makes it hard to write even trivial programs like Hello Dauug, because we’ll need to expressly obtain “Hello Dauug” via a sequence of instructions instead of by pointing to memory containing “Hello Dauug.”

; This is not a scalable way to write Hello Dauug.
;
print2::in = 110_145`o  ; He
call print2
print2::in = 154_154`o  ; ll
call print2
print2::in = 157_040`o  ; o(space)
call print2
print2::in = 104_141`o  ; Da
call print2
print2::in = 165_165`o  ; uu
call print2
print2::in = 147_012`o  ; g(newline)
call print2

; This is even worse: same number of instructions generated
; because 36-bit constants consume 3 instructions each,
; plus 3 registers are permanently tied up by the string.
;
print4::in = 110_145_154_154`o  ; Hell
call print4
print4::in = 157_040_104_141`o  ; o Da
call print4
print4::in = 165_165_147_012`o  ; uug(newline)
call print4

Character constants

We have no way to convert characters to their ASCII or Unicode numeric values.

Memory conservation

The assembler itself is rather cumbersome with memory use. For example, the assembler allocates 65,538 words to produce the 1,385-word Osmin kernel. This could eventually impair assembling large programs on Dauug|36 systems: 1Mi words of data memory would be exhausted trying to assemble a 22,145-instruction program.

Control path assertions

I’m interested in adding some assembler directives that don’t involve registers or produce executable code, but aid with control path checks. What I envision would work like an optimizing compiler’s checks that all control paths return a value, or all variables were written prior to being read. In fact, the implementation can likely be on top of already-planned code for register optimization (liveness testing).

The purpose of these checks is to permit, but not require, developers to erect guard rails around the “spaghettiness” of assembly language, without changing or adding even one instruction to a program. As a really simple example, when the Osmin scheduler transfers control to a user program, the multitasking timer had better be enabled. So I can add directives to enable.timer:: and disable.timer:: that look like:

enable.timer::
    u. timer.setting
    timer.setting = 10_1010111111110111`b   ; 65535 instructions
serial.io:
    timer timer.setting
    jump >= serial.io
    WE_KNOW_TRUE timer.is.enabled           ; added directive
    return

disable.timer::
    u. timer.setting
    timer.setting = 10_0000000000000000`b   ; infinite instructions
serial.io:
    timer timer.setting
    jump >= serial.io
    WE_KNOW_FALSE timer.is.enabled          ; added directive
    return

Then I would modify resume.user.program:: to verify:

resume.user.program::
    REQUIRE_TRUE timer.is.enabled           ; added directive
    revert

In all this, timer.is.enabled isn’t a variable or register. Instead, it’s a semantic property of the program with an assigned true or false value. When the assembler (or more specifically, the register optimizer) processes REQUIRE_TRUE, it must check that all control paths to resume.user.program passed through a matching WE_KNOW_TRUE and that any WE_KNOW_FALSE directives were negated by a subsequent WE_KNOW_TRUE. If these conditions aren’t met, then maybe the multitasking timer isn’t enabled, and assembly should fail.

Obviously the non-trivial semantic properties of a program are undecidable (Rice’s theorem), so the programmer will occasionally need to add some redundant WE_KNOW_* directives based on her expertise.

I don’t know how useful a tool this would be to the assembly language. I do know we eventually want register optimization using liveness analysis and graph coloring anyway, and I believe WE_KNOW_* can map on top of that capability. So the cost of adding and trying this feature needn’t be large.