Assembly language missing features
There are a few things that an assembly language or its assembler should support, but we do not have for Dauug|36 as yet.
Object code management
The cross assembler outputs directly into a list of 36-bit words representing the contents of code RAM, so all we have is a single text segment (code segment). We do not have:
- data segments
- read-only data segments
- uninitialized static data segments
- relocation information
- linking information
- any kind of program header
There is no librarian or linker.
Source code management
There is no method to assemble more than one file into a program. For the cross assembler, a makefile or script could catenate source files prior to assembler—but I have not done this to date.
Register conservation
There is no call tree analysis for scopes, register liveness analysis, graph coloring, manual register coalescing, or other means to reduce the number of registers required by a program.
There is no mechanism for defining symbolic constants in the assembler without allocating a register for each constant. This omission prevents a library from defining a bunch of constants (as in errno.h on POSIX systems) that are available to the developer on an as-needed basis. Instead, every constant featured in the library will diminish by one the number of available registers, whether or not the constant is used in a program.
There is only one kind of constant supported, namely, a register-allocated constant that lives for the entirety of a program. There is no language provision to mark a constant as “generate at each time of use” or “generate when in scope.” These omissions cause more registers to be required than necessary for many programs.
Record access
There is no convenient mechanism for pointer offsets for managing records. This omission forces words within multiple-word data structures to be accessed by numeric offset instead of some kind of symbolic name. So where we may wish to write
component = color -> blue
we have to write something like
component = color addld 2
instead, or possibly even worse,
blue.offset = 2 ; ... component = color addld blue.offset
Memory initialization
We have no semantics, syntax, or support to start a program with any kind of data already in memory, whether it’s a string constant, table, list, tree, or whatever. This makes it hard to write even trivial programs like Hello Dauug, because we’ll need to expressly obtain “Hello Dauug” via a sequence of instructions instead of by pointing to memory containing “Hello Dauug.”
; This is not a scalable way to write Hello Dauug. ; print2::in = 110_145`o ; He call print2 print2::in = 154_154`o ; ll call print2 print2::in = 157_040`o ; o(space) call print2 print2::in = 104_141`o ; Da call print2 print2::in = 165_165`o ; uu call print2 print2::in = 147_012`o ; g(newline) call print2 ; This is even worse: same number of instructions generated ; because 36-bit constants consume 3 instructions each, ; plus 3 registers are permanently tied up by the string. ; print4::in = 110_145_154_154`o ; Hell call print4 print4::in = 157_040_104_141`o ; o Da call print4 print4::in = 165_165_147_012`o ; uug(newline) call print4
Character constants
We have no way to convert characters to their ASCII or Unicode numeric values.
Memory conservation
The assembler itself is rather cumbersome with memory use. For example, the assembler allocates 65,538 words to produce the 1,385-word Osmin kernel. This could eventually impair assembling large programs on Dauug|36 systems: 1Mi words of data memory would be exhausted trying to assemble a 22,145-instruction program.
Control path assertions
I’m interested in adding some assembler directives that don’t involve registers or produce executable code, but aid with control path checks. What I envision would work like an optimizing compiler’s checks that all control paths return a value, or all variables were written prior to being read. In fact, the implementation can likely be on top of already-planned code for register optimization (liveness testing).
The purpose of these checks is to permit, but not require, developers to erect guard rails around the “spaghettiness” of assembly language, without changing or adding even one instruction to a program. As a really simple example, when the Osmin scheduler transfers control to a user program, the multitasking timer had better be enabled. So I can add directives to enable.timer::
and disable.timer::
that look like:
enable.timer:: u. timer.setting timer.setting = 10_1010111111110111`b ; 65535 instructions serial.io: timer timer.setting jump >= serial.io WE_KNOW_TRUE timer.is.enabled ; added directive return disable.timer:: u. timer.setting timer.setting = 10_0000000000000000`b ; infinite instructions serial.io: timer timer.setting jump >= serial.io WE_KNOW_FALSE timer.is.enabled ; added directive return
Then I would modify resume.user.program::
to verify:
resume.user.program:: REQUIRE_TRUE timer.is.enabled ; added directive revert
In all this, timer.is.enabled
isn’t a variable or register. Instead, it’s a semantic property of the program with an assigned true or false value. When the assembler (or more specifically, the register optimizer) processes REQUIRE_TRUE
, it must check that all control paths to resume.user.program
passed through a matching WE_KNOW_TRUE
and that any WE_KNOW_FALSE
directives were negated by a subsequent WE_KNOW_TRUE
. If these conditions aren’t met, then maybe the multitasking timer isn’t enabled, and assembly should fail.
Obviously the non-trivial semantic properties of a program are undecidable (Rice’s theorem), so the programmer will occasionally need to add some redundant WE_KNOW_*
directives based on her expertise.
I don’t know how useful a tool this would be to the assembly language. I do know we eventually want register optimization using liveness analysis and graph coloring anyway, and I believe WE_KNOW_*
can map on top of that capability. So the cost of adding and trying this feature needn’t be large.