Program initialization instructions (privileged)
Opcode | P/U | Category | Description |
CALI |
priv | program init | call stack initialize |
JANY |
priv | program init | jump anywhere |
TIMER |
priv | program init | timer |
XANY |
priv | program init | execute any instruction |
CALI
Call stack initialize
Syntax |
cali addr |
No registers used |
No flags changed |
CALI
(call stack initialize) forces the return address stack to an unknown, nonzero position, then pushes the current instruction pointer and current CPU flags (N
, Z
, T
, R
) on the return address stack, and then replaces the instruction pointer with the location indicated by label addr
.
CALI
serves two essential purposes:
First, CALI
’s use assures that subsequent CALL
, RETURN
, and REVERT
instructions will work as intended. The issue is that the stack position is managed not by an ordinary up/down counter, but by a pair of electrically simpler (and faster) Galois linear feedback shift registers (LFSRs). These LFSRs are able to count forward and backward through through two sets of values, namely, the set of nonzero 8-bit values of which there are 255, and the singleton set containing an 8-bit zero value. The nonzero set allows us to store up to 255 return addresses, but the zero set would only permit one return address. CALI
forces use of the larger, desired set by arbitrarily OR
ing a 1 bit into the call stack position. No control is exerted over the actual position, except that it is not position zero. Nor need we any control over the actual position, because the return address stack is circular.
Second, CALI
prevents user programs from being able to use stack underflow to branch to unpermitted locations. Because the only mechanism to move backward through the stack is by using a RETURN
or REVERT
instruction, and either mechanism would encounter a return address that was pushed by CALI
, the operating system can use CALI
to force a stack-underflowing user program to return to an operating system-specified location for recovery. CALI
’s use would look something like this within OS code that is starting a user program:
cali start_user_program_soon ; We get here if a user program underflows its call stack ; as a result of a RETURN or REVERT instruction. shut_it_down: call terminate_user_program ; This should work. jump shut_it_down ; Just in case it didn't. start_user_program_soon: ; Finish setting up the user program and start it running.
Although CALI
does everything CALL
does plus a little, you may notice in the above example that CALI
branches to a label instead of a scope as CALL
would. The reason for this distinction, which is enforced by the cross assembler, is because CALI
is not actually calling a subroutine.
It is not safe for user code to use CALI
, because CALI
may open to the user preexisting return addresses that were left behind by other programs, or random stack contents that were created when electrical power was first applied. Instead, the operating system uses CALI
to create a “fence” between return addresses pushed by the user’s own CALL
s, which are safe to branch to, and any possible foreign stack contents.
There is no corresponding security concern if the call stack overflows instead of underflows. In the event of overflow the stack will be incorrect, but all incorrect addresses will point into the user’s own code.
JANY
Jump anywhere
Syntax |
jany addr_in_reg |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
JANY
replaces the current running program’s instruction pointer with the contents of register addr_in_reg
, effecting an unconditional branch to that location.
JANY
is privileged, because when the operating system’s program loader moves a user program into code memory to run, the program loader would ordinarily verify at that time that all JUMP
and CALL
destinations are to allowed locations, that is, within the user program itself. The program loader would also ensure the user program contains no privileged instructions. In contrast, JANY
allows the program that executes it to branch to any location in code memory, whether or not the code at that memory location is permissible in the context of the program that is now branching there.
The lack of an “unprivileged JANY
” is an architectural drawback, because constant-time pointers to functions and table-driven branches won’t be available. But this omission delivers a large benefit in security and simplicity to the architecture.
The cross assembler offers a convenience feature for preparing JANY
register contents. In the past, you had to know the exact memory address where your code was going to be. This could be done by examining assembly listings, but manual updates were needed when instructions moved. JANY
use at that time looked like:
jany 319 ; jump to absolute location 319
Today, the assembler can convert a label to a location, although this is limited in that the location would be wrong if the program loader writes the program at a nonzero offset in code memory. But here’s the mechanism:
jany :label ; one alternative some_reg = :label jany some_reg ; another alternative some_reg = :label some_reg = some_reg + program_offset jany some_reg ; requires an offset from the program loader ; more code label: ; JANY will bring us here.
Although you should not use JANY
in user programs, it is compatible with preemptive multitasking. (This is to say, the warning below for XANY
does not apply to JANY
.)
TIMER
Set multitasking timer
Syntax |
timer lfsr_preset |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
TIMER
adds register lfsr_preset
to itself and writes the sum back to the same register. The flags are set as if lfsr_preset
is a signed register. Additionally, bit 18 of the new sum is shifted into the 16-bit multitasking timer preset register.
Before explaining what this means, here are two pieces of sample code that may be helpful in the explanation:
Preventing the multitasking timer from running
; Set the 16-bit multitasking timer preset register to all zeros. unsigned timer.setting timer.setting = 10_0000_0000_0000_0000`b serial.io: timer timer.setting jump >= serial.io
Setting the multitasking timer for 20 instructions
; Set the 16-bit multitasking timer preset register. unsigned timer.setting timer.setting = 10_0111110001010010`b serial.io: timer timer.setting jump >= serial.io
The timer that causes preemptive multitasking to work in Dauug|36 is a simple 16-bit counter that always starts from the same value when the CPU goes into NPRIV
mode. The counter advances one step with each instruction that executes. When all 16 counter bits are 1
s (there are 15 AND
gates that determine this), the timer is said to have expired, and the CPU is taken from the running program by a procedure called instruction decoder hijacking.
In the two preceding code samples, the 16 low bits of timer.setting
is where we want the counter preset when a user program is resumed. These bits are moved into a 16-bit flip-flop named ff tims
(timer setpoint) in the netlist. The 16-bit flip-flop that does the counting is named ff timr
(timer run).
Because the ALU is already congested with respect to wiring, I didn’t connect any of its major nodes to ff tims
. Instead, I ran a single wire from theta’s carry into bit 18 over to ff tims
, which I wired as a serial in, parallel out register. The loops you see in the sample code progressively clock 18 bits into the least significant bit of ff tims
, and the last 16 bits clocked will remain in ff tims
as the multitasking timer setpoint.
That timer.setting
is 18 bits wide is helpful, because a single IMP
(immediate positive) instruction can encode the entire constant. The leading 10_
in bits 17 and 16 pass harmlessly through ff tims
, but eventually reach bits 35 and 34, setting the N
(egative) flag and terminating the loop at the desired point.
Multitasking timer ff timr
is not an ordinary up or down counter, but is instead a 16-bit Galois linear feedback shift register with feedback polynomial x16 +x15 + x13 + x4 + 1, which can also be written as d008`h
. In the timer’s case (unlike the call stack’s case), knowledge of the feedback polynomial chosen is vital to being able to write code for the circuit.
As with the call stack depth counters, ff timr
can count through two sets of values, namely, the set of nonzero 16-bit values of which there are 65,535, and the singleton set containing a 16-bit zero value. Both sets are useful to us; however, only the set with 65,535 different combinations contains the all-ones value 65,535 that causes the timer to expire and the user’s program to be preempted. By using TIMER
to control where in the LFSR sequence the ff tims
setpoint is, the number of instructions that a user program can run prior to preemption is controlled.
The nonzero count sequence for ff timr
can be viewed by running the Python script multitasking_timer_lfsr.py
in the code. (The file may be named lfsrplay.py
in earlier tarballs, but it has the same contents.) This tool should be improved to be easier to use, but for now, line 23011 of its output has the all-ones constant 1111111111111111
. By presetting the timer to some number of lines earlier (you may wraparound from line 0 to line 65535), you determine the number of instructions a user program will run prior to interruption.
Using an offset scheme where 1111111111111111
is “offset 0,” the preceding line 0101111111101111
is “offset 1,” etc., here is the relationship between offset and number of user instructions executed as of 20 June 2023 (it may change slightly):
Offset | Number of instructions |
0 | 2 |
1 | 3 |
2 | 3 (note discontinuity) |
3 | 4 |
4 | 5 |
n | n + 1 |
65534 | 65535 (maximum timer setting) |
Most operating systems for Dauug|36 would do well using the maximum timer setting, which starts its count at 1010111111110111
and would be written in code as:
; Set the longest supported multitasking timer delay. unsigned timer.setting timer.setting = 10_1010111111110111`b serial.io: timer timer.setting jump >= serial.io
At the 20 June 2023 maximum safe clock rate of 66.916 MHz (16.729 MIPS), this maximum delay comes to a little less than 3.92 milliseconds. Although this may seem too short in terms of flexibility for a multitasking timer, the operating system is free to turn the CPU right back to the same program so that it can run longer. The round trip time from user to OS to the same user is only 20 clock cycles (the duration of 5 instructions), so the overhead incurred by not supporting a “bigger” timer can be as little as 5 ÷ 65,535, which comes to less than 77 parts per million.
When the ff tims
setpoint is zero, as shown three samples ago, we say informally that the multitasking timer is disabled. It’s not actually disabled, though. It runs as it ordinarily does, but the sequence generated is an infinite series of zeros. The number 65,535 doesn’t appear in this sequence, so the timer will never expire using this setpoint.
What not to do: forget to include the stop bit at bit 17 in the timer.setting
constant. Without the stop bit, the loop to set the timer will be an infinite loop and will probably hang the operating system.
XANY
Execute any instruction
Syntax |
xany inst_in_reg |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
XANY
fetches an instruction from register inst_in_reg
, decodes the instruction, and fetches its left and right operands from the register file. Simultaneously, XANY
increases the instruction pointer by 0
, in contrast with almost every other instruction, which increases the instruction pointer by 1
.
The effect of XANY
is that the instruction in register inst_in_reg
will be executed in between XANY
and the instruction which follows XANY
in code memory. The instruction in the register can be any instruction at all, so XANY
isn’t safe for use in user programs.
One possible use of XANY
would be to loop over a user program’s 512 registers and zero them out. Here is sample code to do this. This code uses the cross assembler’s opcode
keyword to look up numeric value of an opcode.
unsigned inst dest inst_with_dest ; Shift the IMP (immediate positive) opcode left 27 bits. inst = opcode imp inst = inst lsl 333333_333333`o ; Loop over the 512 destination registers. ; The loop runs fastest if we count downwards. dest = 777_000_000`o loop: inst_with_dest = inst | dest xany inst_with_dest dest = dest - 1_000_000`o jump != loop ; I stopped the loop at 0 to avoid setting the R(ange) flag. ; But register 0 still needs to be cleared. Here we go: xany inst
Aside. Rather than stop the loop early, this example should have used SW
(subtract with wrap) and JUMP <=
. The reason I couldn’t was that when I wrote the wrapping flavors of the additive instructions, I mistakenly defined the results as never negative (because they wrap). I intend to correct this oversight, and then correct the above example. This paragraph written 20 June 2023.
XANY
is incompatible with control decoder hijacking!
Do not use XANY
in code where there is any possibility of the multitasking timer expiring. Ideally, do not have it running. If XANY
overlaps a context switch, either the register-specified instruction, the context switch, or both will behave incorrectly. I haven’t thought through exactly which process(es) will go wrong, but something will.