Cooperative multitasking example

In cooperative multitasking, once a program has the CPU, it continues to have the CPU until it voluntarily releases it. This is done via the Dauug|36 YIELD instruction.

Below is a regression test, written as a single Dauug|36 assembly language module, that implements cooperative multitasking and provides assurance it is working. It contains 116 instructions give or take, so I’ve broken it into sections where I add some description.

There are 14 different privileged instructions used by this code, so I recommend reading the following sections before or alongside this example:

Privileged instructions
Identity-modifying
Program initialization
Pseudo
Unrestricted memory (Only RCM1, RCM2, and WCM apply for this.)

This code example causes four programs to execute simultaneously, but due to the limitations of the cross assembler and electrical simulation as of May 2023, they are written together as a single module and loaded into a contiguous block of code memory from offset 0. A superuser, numerically user 0 in the system, sets up its own call stack and register constants, disables the multitasking timer, and spawns three programs running under user 25, 99, and 222.

The three user programs each have their own call stack, registers, program counter, and flags. They are simple enough to only need registers; otherwise, they would also have their own page table and data memory. These programs do not yield the CPU on the same schedule, so they are in different places as they compute. They do, however, execute the same computation and will have the same result. The basic computation is done in an elaborate loop that counts steps, yields the CPU at various points, and adds some other tests, but the underlying numeric computation we are performing is this substitution-permutation network operation:

unsigned word
word = 0
word = word mix 0   ; 20 copies of this line

As each user finishes its 20th MIX instruction, the CPU is paused (via HALT) so the test script can verify that word is at its then-correct value of 31,358,733,809. The superuser (the “operating system” within this example) and three users are written to run forever; however, the simulation script (not shown here) will shut everything down after the third HALT and verification finish.

It may seem like 20 instructions (plus a lot of loop overhead) per running program is a really short test for a multitasking CPU. I concede that it is, but this is only intended to be a regression test and coding example. This test case takes about 10 seconds to execute, and it’s just one of three dozen regression tests as of June 2023 that I run many times per day as I modify the CPU and firmware. So this test, like its 35 siblings, needs to be as short as it can without fully undermining its purpose.

The following segments of source code, when joined together in order, form this cooperative multitasking example.

Main superuser code

Execution begins with a block of superuser code that loops forever to feed CPU time to the three user programs. This code has five principal tasks:

load immediate values (numeric constants) into registers
initialize the superuser’s call stack
disable the multitasking timer
prepare three user programs to run
run three user programs in turn forever

Here is the code under consideration, followed by an explanation of how it achieves the above five tasks.

; Evidence of cooperative multitasking (CM)
; MWA 26 May 2023 ff

    ; This initializes the superuser call stack.
    priv
    cali super.stack
super.stack:

    ; Because this is cooperative multitasking (and other reasons),
    ; we turn off the multitasking timer here.
    unsigned timer.setting
    timer.setting = 10_0000_0000_0000_0000`b
disable.timer:
    timer timer.setting
    jump >= disable.timer

    ; These next three blocks arrange for three user programs
    ; to run simultaneously using a variety of parameters.
    ; There are also two versions of the code being run.
    user 25
    prepare.to.run.user::program = ::program.1
    poke program.1::goal = 20
    poke program.1::mask = 3`o
    poke program.1::prng = 25
    call prepare.to.run.user

    user 99
    prepare.to.run.user::program = ::program.2
    poke program.2::goal = 20
    poke program.2::mask = 7`o
    poke program.2::prng = 99
    call prepare.to.run.user

    user 222
    prepare.to.run.user::program = ::program.2
    poke program.2::goal = 20
    poke program.2::mask = 3`o
    poke program.2::prng = 222
    call prepare.to.run.user

    ; This loop cycles between three users forever as they
    ; YIELD the CPU in sequence back to the superuser (this code).
forever:
    user 25
    npcall resume.user.program
    user 99
    npcall resume.user.program
    user 222
    npcall resume.user.program
    jump forever

    ; To continue running a user program, we pop its instruction
    ; pointer and flags while already in NPRIV mode. Note we can't
    ; just inline this REVERT to save an instruction, because the
    ; NPCALL is what preserves the superuser's instruction pointer.
resume.user.program::
    revert

1. Load immediate values

This program needs several integer constants, such as 20, 99, and 222, but the architecture’s instruction format cannot accommodate these constants for most instructions directly. Instead, the assembler automatically appends to the executable code a callable routine called a “glossary” that moves each constant into a register that is dedicated for it. This is transparent to the programmer, so you won’t find any indication in the source code that this happens.

Because the glossary has no source code for you to see, here is its disassembler output instead, as well as a prepended CALL to reach it:

  0  c                94

 94  imp       1      25
 95  imp       2      20
 96  imp       3       3
 97  imp       4      99
 98  imp       5       7
 99  imp       6     222
100  imp       7       1
101  imh       8 131_072
102  imp       0       0
103  r         0   0   0

What the disassembler labeled as c and r are the CALL and RETURN opcodes. The remaining opcodes are ALU: immediate instructions. What’s happening is that a short subroutine is appended at code address 94 that places integer constants in registers 1 through 8, and then the frequently-used constant 0 is placed in register 0. The reason register 0 is filled last is that a temporary register is required to load most arbitrary 36-bit constants, and register 0 is held back for use as that temporary register until all the nonzero constants are finished. We don’t see this use as a temporary register in this example, because all of this example program’s constants can be encoded in 18 bits.

The glossary is separated out as a callable routine in order to support programs that need to load the register constants more than once. All that is required to load them again is to CALL code address 94. We’ll need to do this at the start of all three user program instances, because although they use the same pool of constants, they have no access to them via the superuser’s registers.

2. Initialize the superuser’s call stack

CALL, RETURN, and REVERT will not work correctly until the call stack is set up, so we see a CALI instruction to provide this. It’s prefaced by a PRIV in order to ensure we’re in superuser mode and are setting up the correct call stack. There are two subtleties to watch out for.

First, as you just read, the program already executed a CALL and RETURN to run the glossary code, but CALI had not happened yet. It turns out this is okay, because what CALI does is guarantee that the call stack has depth 255 instead of depth 1. We don’t know how deep the call stack could go when the glossary ran, but a depth of 1 would have been enough to handle the glossary.

Second, although we see a PRIV in the assembly code before CALI, the electrical simulation in fact forces the CPU into PRIV mode every time it starts. On physical, non-simulated minicomputers, either the firmware loader, boot loader, or operating system will need to ensure the CPU goes into PRIV mode at the outset.

3. Disable the multitasking timer

Because this example uses cooperative multitasking and will not pull the CPU away from a program until it YIELDs, the multitasking timer’s 16-bit setpoint is filled with zeros so that the timer will never expire. It is good practice to disable the timer anyway, because the procedure to prepare and start a user program goes into NPRIV mode several times, and depending on the timer’s setting and what initialization is done while NPRIV, an unexpected preemption might occur under unusual use cases.

4. Prepare three user programs to run

The user programs in this example all have registers goal, mask, and prng that need to be initialized with parameters prior to starting. We use the POKE instruction to copy the superuser’s (constant-valued) registers into the three parameter registers for each of the three users. For each program instance being prepared, a USER instruction specifies which nonprivileged program’s call stack, registers, and (ordinarily but not here) page table are being prepared by the relevant Identity-modifying instructions. This example arbitrarily names users 25, 99, and 222, but they can be any three distinct values between 1 and 255 inclusive, and they need not be in ascending (or any particular) order.

We also call a subroutine named prepare.to.run.user, which I will discuss in the next major section. It only requires one parameter, which is the code address of the first instruction each user to execute. This is stored in a register named program, and we set it directly (we don’t POKE it), because it’s the superuser that will force each user to start at its specified code address.

Although we don’t have to copy the first instruction’s address across program boundaries, the register program is defined within the scope of the prepare.to.run.user subroutine. In essence, program is a local register to that subroutine, and that register is “kept” valid between calls to prepare.to.run.user so that it’s available from the outside to pass the parameter in.

The last little piece of this setup is that the assembler supports the notations :label and ::scope for assignment statements, allowing code memory addresses to be loaded into registers. This is how the addresses program.1 and program.2 get saved as registers. This method only supports “low” addresses of less than 2¹⁸, which is sufficient for the architecture at this stage of its development.

5. Run three user programs in turn forever

Up to now, not one instruction of any of the three user programs has been executed. The CPU is in PRIV (superuser) mode, the input parameters for each user is in their appropriate registers belonging to that user, the call stacks have been initialized (as I explain in the next major section), and each user’s starting address in code memory has been pushed on its call stack.

So now we see this forever loop that the superuser runs, where we switch to each of the three users before repeating the loop. The transition from superuser to user involves pushing the superuser’s instruction pointer and flags onto the superuser’s stack, then switching the CPU to NPRIV to access the user’s stack, registers, and (not applicable here) page table. This much is done by the NPCALL instruction, which amounts to a CALL followed by a NPRIV.

The NPCALL branches to a scope called resume.user.program containing the single instruction REVERT. This REVERT, which operates on the user’s call stack, restores the user’s last saved (or initial) instruction pointer and flags, and at this point the user program is running. The user program will continue to run until either the multitasking timer expires (which we ensured will not occur in this example), or the user program executes a YIELD instruction.

With the REVERT, the superuser’s then-current instruction pointer and flags are irretrievably overwritten. In essence, the CPU went down a path and got to a place in the program that vanishes without a trace. When the superuser reawakens after the next YIELD, the superuser will be at the instruction immediately following its most-recent NPCALL with the superuser’s flags as they were.

User call stack preparation

The subroutine that follows is called once for each of the three user programs we are preparing to run. We already dealt with passing parameters to the user programs in the “main superuser code” above, but we need to set up the call stack for each. The two main steps are:

initialize the user program’s call stack (see CALI for explanation)
place the user program’s (code memory) start address on its call stack

After these two steps, the user program is ready to run, but has not started yet. Here is the code for these two steps:

; For the non-superuser (already selected by 'USER'), place the address
; 'program' on its stack. Don't really run the user yet.
prepare.to.run.user::
    unsigned program        ; input: starting address of program
    keep program

    ; Initialize the return address stack.
    ; The documentation for CALI describes how this works.
    setup
    cali stack.ready        ; This is a branch!

    ; Arriving here means either (a) the user underflowed its call stack,
    ; or (b) the user is branching here to purposely terminate. In a real
    ; operating system, the user should be removed from the schedule and
    ; its resources should be deallocated. But lacking a real operating
    ; system for this example, I've put an infinite loop of YIELD
    ; instructions so that other programs have an opportunity to run.
stop.user:
    nop
    yield
    jump stop.user

    ; When we get here, the return address stack has been initialized.
stack.ready:
    priv                    ; Get back out of SETUP mode.

    ; TEMPORARY MODIFICATION TO CODE MEMORY
    ; -------------------------------------
    ; Objective:   Need address of 'program' on user's return address stack.
    ; Limitation:  The only way to write 'program' onto this stack is to
    ;              execute a CALL instruction from location 'program' - 1.
    ; Solution:    A CALL instruction will be temporarily written at
    ;              'program' - 1, and we'll branch to it.

    ; First, we save the instruction we're about to overwrite with CALL.
    unsigned prior.place prior.inst
    prior.place = program - 1
    rcm1 prior.place
    rcm2 prior.inst

    ; Second, we need to create the 36-bit CALL instruction.
    ; It has the C (CALL) opcode and a branch address.
    unsigned instr
    unsigned address.to.call
    instr = opcode c
    address.to.call = :trampoline
    instr = instr | address.to.call

    ; Now write the CALL instruction at location 'program' - 1.
    wcm prior.place = instr

    ; Switch to user mode and jump to 'program' - 1.
    ; We'll need to write this address to a user register, and use
    ; JANY to jump there.
    poke prior.place = prior.place
    npriv
    jany prior.place

    ; YOU ARE NOW IN THE TWILIGHT ZONE.
    ; ---------------------------------
    ; More specifically, you've branched to 'program' - 1, which does not
    ; appear in this source code, where there is a CALL instruction to
    ; branch to the following line:

trampoline:
    priv

    ; Let's review what just happened. We reached here via a CALL instruction,
    ; and we're now in superuser (PRIV) mode. From the superuser's perspective,
    ; we didn't call here, we only branched here, because the CALL was done
    ; in user (NPRIV) mode and did not touch the superuser's return address
    ; stack. As for the user's stack, it now contains the starting address
    ; for the user program. Magic!

    ; RESTORATION OF CODE MEMORY
    ; --------------------------
    ; In the course of this mischief, we overwrote the instruction at
    ; address 'program' - 1, which likely didn't belong to us. Put it back
    ; as it used to be.
    wcm prior.place = prior.inst

    ; We have finished preparing this user program to run. This RETURN
    ; corresponds to "call prepare.to.run.user" above and marks the end
    ; of this scope.
    return

User program 1

What follows is the first of two test programs that are run in this cooperative multitasking example. Only one instance of this program is started; two instances of the second test program are executed at the same time.

Because the three instances YIELD at difficult-to-predict times based on their seed and mask settings, I can’t dependably predict the order they will finish in without a lengthy test that makes the order (usually) predictable. For this reason, the three instances compute the same result so that the surrounding test script isn’t affected by the order they finish in. Unfortunately, this means that if two scripts never output a result for some reason, the third script can “loop around” the 36-bit counter and supply correct results for them. Why don’t I fix this? Because the electrical simulation is so slow that the time required to produce a false “pass” in this manner would measure in decades to centuries.

Note that this program’s first instruction is a call to the glossary that follows the program. The glossary loads constant registers with their correct values. This need was discussed above under “load immediate values” for the main superuser code.

Here is the code for user program 1:

((
    This is a short test program for cooperative multitasking. Using 0 as
    the key and initial value of 'word', it successively uses MIX (a
    substitution-permutation network instruction) to modify 'word'.

    This program does not terminate, because the superuser "scaffolding"
    around it doesn't maintain a process list and therefore doesn't have a
    way to allow this program to terminate. But it does have a 'goal' of
    how many MIX instructions to apply before verifying that 'word' has
    the correct value. When that goal is reached, a HALT is executed so
    the simulator script can check. Here are a few correct values of 'word'
    at various 'goal' settings:

        goal         word
        ----  -----------
          20  31358733809
         100  42533060964
         250  49274620803
        1000  19859066012

    This program has two complications added:

    1.  It periodically YIELDs the CPU, based on a faux pseudorandom
        number generator implemented using XIM (another substitution-
        permutation network instruction).

        By "faux," what I mean is that this PRNG uses XIM all by
        itself to create a stream of numbers, which gives us
        NO ASSURANCE AT ALL as to the generator's period. So don't
        cut-paste this code into anything else. For a description of
        a much better PRNG, see the dissertation at pages 127-131.

        WARNING: This XIM-only PRNG is *so horrible* that there might
        be seed values that never cause the CPU to yield, causing this
        regression test to catastrophically fail.

        In any event, provided an initial seed named 'prng' and a
        bitmask 'mask'. Whenever all bits identified by 'mask' are
        zeros in 'prng', the program will YIELD. So for various values
        of 'mask', the CPU will yield:

              mask  probability of CPU yielding
            ------  ---------------------------
               1`b  about 50% of the time
              11`b  about 25% of the time
             111`b  about 12.5% of the time
            1111`b  about 6.25% of the time

    2.  I stuck some "unnecessary" JUMP instructions in the vicinity of
        the YIELD to explore for incompatibilities between JUMP and YIELD.
        Test coverage either wasn't thought out or wasn't adequately
        documented, although no problems were anticipated or found. An
        improper jump would likely HALT with 'word' at an unexpected value,
        causing the simulator script to indicate a mistake.

        I would explain here why I think YIELD is compatible with the
        entire instruction set (except XANY), but it would be hand-waving,
        which is probably worse than not offering an explanation yet.
))
program.1::
    call glossary       ; Initialize constant registers for this user.

    unsigned goal       ; input: how many MIX operations to compute
    unsigned mask       ; input: yield CPU when PRNG and mask == 0
    unsigned prng       ; input: faux PRNG (no LFSR), must seed this register
    keep goal mask prng

    unsigned maybe      ; Only the Z(ero) flag from this register is used.
    unsigned n          ; count of how many MIX operations so far
    unsigned word       ; current state latest MIX

    n = 0
    word = 0
loop:

    ; At random times, branch around the following YIELD.
    prng = prng xim 0
    maybe = prng and mask
    jump != do.not.yield

    ; If we didn't branch, request that the CPU yield.
    ; Two more instructions will be executed prior to the actual yield.
    yield

    ; Try to break the YIELD with JUMP instructions and vice versa.
    jump with.me.1
    halt
with.me.1:
    jump with.me.2
    halt
with.me.2:

    ; Whether we yielded or not, see if we reached our goal for
    ; how many MIX instructions to do.
do.not.yield:
    cmp goal - n                ; Compare n with goal.
    jump != not.at.goal

    ; We're at our goal, so report the current 'word' back to the .ns script.
    ; After checking if 'word' is correct, the script will resume the CPU.
    word = word
    halt

    ; Whether or not that was our goal, do the next MIX instruction in our
    ; sequence, and increment the count of how many MIXes have been done.
not.at.goal:
    word = word mix 0
    n = n + 1
    jump loop

User program 2

Here is the code for the remaining test program for this cooperative multitasking example:

((
    This is a modified version of 'program.1'. It does exactly the same
    computation, but rather than add JUMP instructions in the vicinity
    of YIELD, it sets or clears the Z(ero) flag depending on the present
    PRNG state, and tests to ensure the Z flag, whether it's 0 or 1,
    is restored to its correct value when the program resumes after
    being suspended.

    Because the N(egative), Z(ero), T(emporal), and R(ange) flags are
    electrical clones in terms of their handling, I infer if Z(ero)
    is saved and restored as anticipated, it's probable that the remaining
    three flags are also saved and restored as designed.

    See the comments above 'program.1' for the base design of this program.
))
program.2::
    call glossary       ; Initialize constant registers for this user.

    unsigned goal       ; input: how many MIX operations to compute
    unsigned mask       ; input: yield CPU when PRNG and mask == 0
    unsigned prng       ; input: faux PRNG (no LFSR), must seed this register
    keep goal mask prng

    unsigned maybe      ; Only the Z(ero) flag from this register is used.
    unsigned n          ; count of how many MIX operations so far
    unsigned word       ; current state latest MIX
    unsigned z.have     ; Bit 35 is cleared iff the Z(ero) flag is set.
    unsigned z.want     ; value for z.have in the event of correct operation

    n = 0
    word = 0
loop:

    ; At random times, branch around the following YIELD.
    prng = prng xim 0
    maybe = prng and mask
    jump != do.not.yield

    ; If we didn't branch, request that the CPU yield.
    ; Two more instructions will be executed prior to the actual yield.
    yield

    ; This was already fetched. Any instruction except XANY works fine here.
    nop

    ; Before yielding to the superuser, arbitrarily set the Z(ero) flag.
    z.want = prng and 400000_000000`o

    ; ----- THIS IS WHERE THE YIELD ACTUALLY SUSPENDS EXECUTION

    ; Find out if Z(ero) changed (very bad!) while the program was asleep.
    jump == is.zero.now
    z.have = 400000_000000`o
    jump reckon
is.zero.now:
    z.have = 0
reckon:
    cmp z.have - z.want
    jump == is.good
flag.test.failed:
    word = 0
    halt                        ; Send an incorrect result to the test script.
    jump flag.test.failed
is.good:

    ; Whether we yielded or not, see if we reached our goal for
    ; how many MIX instructions to do.
do.not.yield:
    cmp goal - n                ; Compare n with goal.
    jump != not.at.goal

    ; We're at our goal, so report the current 'word' back to the .ns script.
    ; After checking if 'word' is correct, the script will resume the CPU.
    word = word
    halt

    ; Whether or not that was our goal, do the next MIX instruction in our
    ; sequence, and increment the count of how many MIXes have been done.
not.at.goal:
    word = word mix 0
    n = n + 1
    jump loop

Glossary label

    ; This is the end of the source code. The assembler appends the
    ; glossary scope for constants here.
glossary::