The Dauug House Wright State University logo
Dauug|36 minicomputer documentation

Program initialization instructions (privileged)

Opcode P/U Category Description
CALI priv program init call stack initialize
JANY priv program init jump anywhere
TIMER priv program init timer
XANY priv program init execute any instruction

CALI Call stack initialize

cali addr
No registers used
No flags changed

CALI (call stack initialize) forces the return address stack to an unknown, nonzero position, then pushes the current instruction pointer and current CPU flags (N, Z, T, R) on the return address stack, and then replaces the instruction pointer with the location indicated by label addr.

CALI serves two essential purposes:

First, CALI’s use assures that subsequent CALL, RETURN, and REVERT instructions will work as intended. The issue is that the stack position is managed not by an ordinary up/down counter, but by a pair of electrically simpler (and faster) Galois linear feedback shift registers (LFSRs). These LFSRs are able to count forward and backward through through two sets of values, namely, the set of nonzero 8-bit values of which there are 255, and the singleton set containing an 8-bit zero value. The nonzero set allows us to store up to 255 return addresses, but the zero set would only permit one return address. CALI forces use of the larger, desired set by arbitrarily ORing a 1 bit into the call stack position. No control is exerted over the actual position, except that it is not position zero. Nor need we any control over the actual position, because the return address stack is circular.

Second, CALI prevents user programs from being able to use stack underflow to branch to unpermitted locations. Because the only mechanism to move backward through the stack is by using a RETURN or REVERT instruction, and either mechanism would encounter a return address that was pushed by CALI, the operating system can use CALI to force a stack-underflowing user program to return to an operating system-specified location for recovery. CALI’s use would look something like this within OS code that is starting a user program:

    cali start_user_program_soon

    ; We get here if a user program underflows its call stack
    ; as a result of a RETURN or REVERT instruction.
    call terminate_user_program     ; This should work.
    jump shut_it_down               ; Just in case it didn't.

    ; Finish setting up the user program and start it running.

Although CALI does everything CALL does plus a little, you may notice in the above example that CALI branches to a label instead of a scope as CALL would. The reason for this distinction, which is enforced by the cross assembler, is because CALI is not actually calling a subroutine.

It is not safe for user code to use CALI, because CALI may open to the user preexisting return addresses that were left behind by other programs, or random stack contents that were created when electrical power was first applied. Instead, the operating system uses CALI to create a “fence” between return addresses pushed by the user’s own CALLs, which are safe to branch to, and any possible foreign stack contents.

There is no corresponding security concern if the call stack overflows instead of underflows. In the event of overflow the stack will be incorrect, but all incorrect addresses will point into the user’s own code.

JANY Jump anywhere

jany addr_in_reg
Register Signedness
All ignored
1 opcode only
No flags changed

JANY replaces the current running program’s instruction pointer with the contents of register addr_in_reg, effecting an unconditional branch to that location.

JANY is privileged, because when the operating system’s program loader moves a user program into code memory to run, the program loader would ordinarily verify at that time that all JUMP and CALL destinations are to allowed locations, that is, within the user program itself. The program loader would also ensure the user program contains no privileged instructions. In contrast, JANY allows the program that executes it to branch to any location in code memory, whether or not the code at that memory location is permissible in the context of the program that is now branching there.

The lack of an “unprivileged JANY” is an architectural drawback, because constant-time pointers to functions and table-driven branches won’t be available. But this omission delivers a large benefit in security and simplicity to the architecture.

The cross assembler offers a convenience feature for preparing JANY register contents. In the past, you had to know the exact memory address where your code was going to be. This could be done by examining assembly listings, but manual updates were needed when instructions moved. JANY use at that time looked like:

jany 319                ; jump to absolute location 319

Today, the assembler can convert a label to a location, although this is limited in that the location would be wrong if the program loader writes the program at a nonzero offset in code memory. But here’s the mechanism:

    jany :label         ; one alternative

    some_reg = :label
    jany some_reg       ; another alternative

    some_reg = :label
    some_reg = some_reg + program_offset
    jany some_reg       ; requires an offset from the program loader

    ; more code

    ; JANY will bring us here.

Although you should not use JANY in user programs, it is compatible with preemptive multitasking. (This is to say, the warning below for XANY does not apply to JANY.)

TIMER Set multitasking timer

timer lfsr_preset
Register Signedness
All ignored
1 opcode only
Flag Set if and only if
N bit 35 of the result is set
Z all result bits are zero
T flag does not change
R flag does not change

TIMER adds register lfsr_preset to itself and writes the sum back to the same register. The flags are set as if lfsr_preset is a signed register. Additionally, bit 18 of the new sum is shifted into the 16-bit multitasking timer preset register.

Before explaining what this means, here are two pieces of sample code that may be helpful in the explanation:

Preventing the multitasking timer from running

    ; Set the 16-bit multitasking timer preset register to all zeros.
    unsigned timer.setting
    timer.setting = 10_0000_0000_0000_0000`b
    timer timer.setting
    jump >=

Setting the multitasking timer for 20 instructions

    ; Set the 16-bit multitasking timer preset register.
    unsigned timer.setting
    timer.setting = 10_0111110001010010`b
    timer timer.setting
    jump >=

The timer that causes preemptive multitasking to work in Dauug|36 is a simple 16-bit counter that always starts from the same value when the CPU goes into NPRIV mode. The counter advances one step with each instruction that executes. When all 16 counter bits are 1s (there are 15 AND gates that determine this), the timer is said to have expired, and the CPU is taken from the running program by a procedure called instruction decoder hijacking.

In the two preceding code samples, the 16 low bits of timer.setting is where we want the counter preset when a user program is resumed. These bits are moved into a 16-bit flip-flop named ff tims (timer setpoint) in the netlist. The 16-bit flip-flop that does the counting is named ff timr (timer run).

Because the ALU is already congested with respect to wiring, I didn’t connect any of its major nodes to ff tims. Instead, I ran a single wire from theta’s carry into bit 18 over to ff tims, which I wired as a serial in, parallel out register. The loops you see in the sample code progressively clock 18 bits into the least significant bit of ff tims, and the last 16 bits clocked will remain in ff tims as the multitasking timer setpoint.

That timer.setting is 18 bits wide is helpful, because a single IMP (immediate positive) instruction can encode the entire constant. The leading 10_ in bits 17 and 16 pass harmlessly through ff tims, but eventually reach bits 35 and 34, setting the N(egative) flag and terminating the loop at the desired point.

Multitasking timer ff timr is not an ordinary up or down counter, but is instead a 16-bit Galois linear feedback shift register with feedback polynomial x16 +x15 + x13 + x4 + 1, which can also be written as d008`h. In the timer’s case (unlike the call stack’s case), knowledge of the feedback polynomial chosen is vital to being able to write code for the circuit.

As with the call stack depth counters, ff timr can count through two sets of values, namely, the set of nonzero 16-bit values of which there are 65,535, and the singleton set containing a 16-bit zero value. Both sets are useful to us; however, only the set with 65,535 different combinations contains the all-ones value 65,535 that causes the timer to expire and the user’s program to be preempted. By using TIMER to control where in the LFSR sequence the ff tims setpoint is, the number of instructions that a user program can run prior to preemption is controlled.

The nonzero count sequence for ff timr can be viewed by running the Python script in the code. (The file may be named in earlier tarballs, but it has the same contents.) This tool should be improved to be easier to use, but for now, line 23011 of its output has the all-ones constant 1111111111111111. By presetting the timer to some number of lines earlier (you may wraparound from line 0 to line 65535), you determine the number of instructions a user program will run prior to interruption.

Using an offset scheme where 1111111111111111 is “offset 0,” the preceding line 0101111111101111 is “offset 1,” etc., here is the relationship between offset and number of user instructions executed as of 20 June 2023 (it may change slightly):

Offset Number of instructions
0 2
1 3
2 3 (note discontinuity)
3 4
4 5
n n + 1
65534 65535 (maximum timer setting)

Most operating systems for Dauug|36 would do well using the maximum timer setting, which starts its count at 1010111111110111 and would be written in code as:

    ; Set the longest supported multitasking timer delay.
    unsigned timer.setting
    timer.setting = 10_1010111111110111`b
    timer timer.setting
    jump >=

At the 20 June 2023 maximum safe clock rate of 66.916 MHz (16.729 MIPS), this maximum delay comes to a little less than 3.92 milliseconds. Although this may seem too short in terms of flexibility for a multitasking timer, the operating system is free to turn the CPU right back to the same program so that it can run longer. The round trip time from user to OS to the same user is only 20 clock cycles (the duration of 5 instructions), so the overhead incurred by not supporting a “bigger” timer can be as little as 5 ÷ 65,535, which comes to less than 77 parts per million.

When the ff tims setpoint is zero, as shown three samples ago, we say informally that the multitasking timer is disabled. It’s not actually disabled, though. It runs as it ordinarily does, but the sequence generated is an infinite series of zeros. The number 65,535 doesn’t appear in this sequence, so the timer will never expire using this setpoint.

What not to do: forget to include the stop bit at bit 17 in the timer.setting constant. Without the stop bit, the loop to set the timer will be an infinite loop and will probably hang the operating system.

XANY Execute any instruction

xany inst_in_reg
Register Signedness
All ignored
1 opcode only
No flags changed

XANY fetches an instruction from register inst_in_reg, decodes the instruction, and fetches its left and right operands from the register file. Simultaneously, XANY increases the instruction pointer by 0, in contrast with almost every other instruction, which increases the instruction pointer by 1.

The effect of XANY is that the instruction in register inst_in_reg will be executed in between XANY and the instruction which follows XANY in code memory. The instruction in the register can be any instruction at all, so XANY isn’t safe for use in user programs.

One possible use of XANY would be to loop over a user program’s 512 registers and zero them out. Here is sample code to do this. This code uses the cross assembler’s opcode keyword to look up numeric value of an opcode.

    unsigned inst dest inst_with_dest

    ; Shift the IMP (immediate positive) opcode left 27 bits.
    inst = opcode imp
    inst = inst lsl 333333_333333`o

    ; Loop over the 512 destination registers.
    ; The loop runs fastest if we count downwards.
    dest = 777_000_000`o
    inst_with_dest = inst | dest
    xany inst_with_dest
    dest = dest - 1_000_000`o
    jump != loop

    ; I stopped the loop at 0 to avoid setting the R(ange) flag.
    ; But register 0 still needs to be cleared. Here we go:
    xany inst

Aside. Rather than stop the loop early, this example should have used SW (subtract with wrap) and JUMP <=. The reason I couldn’t was that when I wrote the wrapping flavors of the additive instructions, I mistakenly defined the results as never negative (because they wrap). I intend to correct this oversight, and then correct the above example. This paragraph written 20 June 2023.

XANY is incompatible with control decoder hijacking!

Do not use XANY in code where there is any possibility of the multitasking timer expiring. Ideally, do not have it running. If XANY overlaps a context switch, either the register-specified instruction, the context switch, or both will behave incorrectly. I haven’t thought through exactly which process(es) will go wrong, but something will.

Marc W. Abel
Computer Science and Engineering
College of Engineering and Computer Science
Without secure hardware, there is no secure software.