Preemptive multitasking example

In preemptive multitasking, running programs don’t need to do anything to give up the CPU so that other programs can run. Instead, the hardware and operating system will take the CPU back without permission or assistance from the program that was running. This is arranged for by a hardware timer and the Dauug|36 TIMER instruction.

Below is a regression test, written as a single Dauug|36 assembly language module, that implements preemptive multitasking and provides assurance it is working. It contains 75 instructions give or take, so I’ve broken it into sections where I add some description.

There are 14 different privileged instructions used by this code, so I recommend reading the following sections before or alongside this example:

Privileged instructions
Identity-modifying
Program initialization
Pseudo
Unrestricted memory (Only RCM1, RCM2, and WCM apply for this.)

This code example causes four programs to execute simultaneously, but due to the limitations of the cross assembler and electrical simulation as of May 2023, they are written together as a single module and loaded into a contiguous block of code memory from offset 0. A superuser, numerically user 0 in the system, sets up its own call stack and register constants, disables the multitasking timer, sets up three programs to run as user 25, 99, and 222, configures the multitasking timer, and starts the three programs.

The three user programs each have their own call stack, registers, program counter, and flags. They are simple enough to only need registers; otherwise, they would also have their own page table and data memory. The three programs execute the same computation, although to different finishing points. Each program produce a different numeric result. The basic computation is done in a 5-instruction loop that does the following substitution-permutation network operation:

unsigned word
word = 0
word = word mix 0   ; 20 copies of this line

As each user finishes its assigned number of MIX instructions, the CPU is paused (via HALT) so the test script can verify that word is at its then-correct value for that point in the sequence. The superuser (the “operating system” within this example) and three users are written to run forever; however, the simulation script (not shown here) will shut everything down after the third HALT and verification finish.

It may seem like 20 instructions (plus loop overhead) per running program is a really short test for a multitasking CPU. I concede that it is, but this is only intended to be a regression test and coding example. This test case takes about 5 seconds to execute, and it’s just one of three dozen regression tests as of June 2023 that I run many times per day as I modify the CPU and firmware. So this test, like its 35 siblings, needs to be as short as it can without fully undermining its purpose.

The following segments of source code, when joined together in order, form this preemptive multitasking example.

Main superuser code

Execution begins with a block of superuser code that loops forever to feed CPU time to the three user programs. This code has five principal tasks:

load immediate values (numeric constants) into registers
initialize the superuser’s call stack
disable the multitasking timer
prepare three user programs to run
set the multitasking timer
run three user programs in turn forever

Here is the code under consideration, followed by an explanation of how it achieves the above six tasks.

; Evidence of preemptive multitasking (PM)
; MWA 26 May 2023 ff

    ; This initializes the superuser call stack.
    priv
    cali super.stack
super.stack:

((
    We temporarily turn off the preemption timer for multitasking here,
    because the superuser needs to enter NPRIV mode a few times while
    preparing everything to run. Although these NPRIV intervals are brief
    and the multitasking timer would ordinarily be set for thousands of
    instructions, disabling the timer will reduce opportunity for
    surprises in the event this program is modified.
))
    unsigned timer.setting
    timer.setting = 10_0000_0000_0000_0000`b
disable.timer:
    timer timer.setting
    jump >= disable.timer

((
    These next three blocks arrange for three user programs
    to run simultaneously using a variety of parameters.
    There is just one version of the program being run.
))
    user 25
    prepare.to.run.user::program = ::program
    poke program::goal = 30
    call prepare.to.run.user

    user 99
    prepare.to.run.user::program = ::program
    poke program::goal = 20
    call prepare.to.run.user

    user 222
    prepare.to.run.user::program = ::program
    poke program::goal = 40
    call prepare.to.run.user

((
    Now we can enable the multitasking timer. In this example, we set the
    16-bit LFSR to 0111110001010010`b, which is 19 values prior to
    1111111111111111`b where the timer expires. This setting will permit
    each program to execute 20 instructions prior to each preemption.
))
    timer.setting = 10_0111110001010010`b
enable.timer:
    timer timer.setting
    jump >= enable.timer

    ; This loop cycles between three users forever as they
    ; YIELD the CPU in sequence back to the superuser (this code).
forever:
    user 25
    npcall resume.user.program
    user 99
    npcall resume.user.program
    user 222
    npcall resume.user.program
    jump forever

    ; To continue running a user program, we just pop its instruction
    ; pointer and flags while already in NPRIV mode.
resume.user.program::
    revert

1. Load immediate values

This program needs several integer constants, such as 20, 99, and 222, but the architecture’s instruction format cannot accommodate these constants for most instructions directly. Instead, the assembler automatically appends to the executable code a callable routine called a “glossary” that moves each constant into a register that is dedicated for it. This is transparent to the programmer, so you won’t find any indication in the source code that this happens.

Because the glossary has no source code for you to see, here is its disassembler output instead, as well as a prepended CALL to reach it:

 0  c                60

60  imp       1      25
61  imp       2      30
62  imp       3      99
63  imp       4      20
64  imp       5     222
65  imp       6      40
66  imp       7       1
67  imp       0       0
68  r         0   0   0

What the disassembler labeled as c and r are the CALL and RETURN opcodes. The remaining opcodes are ALU: immediate instructions. What’s happening is that a short subroutine is appended at code address 60 that places integer constants in registers 1 through 7, and then the frequently-used constant 0 is placed in register 0. The reason register 0 is filled last is that a temporary register is required to load most arbitrary 36-bit constants, and register 0 is held back for use as that temporary register until all the nonzero constants are finished. We don’t see this use as a temporary register in this example, because all of this example program’s constants can be encoded in 18 bits.

The glossary is separated out as a callable routine in order to support programs that need to load the register constants more than once. All that is required to load them again is to CALL code address 60. We’ll need to do this at the start of all three user program instances, because although they use the same pool of constants, they have no access to them via the superuser’s registers.

2. Initialize the superuser’s call stack

CALL, RETURN, and REVERT will not work correctly until the call stack is set up, so we see a CALI instruction to provide this. It’s prefaced by a PRIV in order to ensure we’re in superuser mode and are setting up the correct call stack. There are two subtleties to watch out for.

First, as you just read, the program already executed a CALL and RETURN to run the glossary code, but CALI had not happened yet. It turns out this is okay, because what CALI does is guarantee that the call stack has depth 255 instead of depth 1. We don’t know how deep the call stack could go when the glossary ran, but a depth of 1 would have been enough to handle the glossary.

Second, although we see a PRIV in the assembly code before CALI, the electrical simulation in fact forces the CPU into PRIV mode every time it starts. On physical, non-simulated minicomputers, either the firmware loader, boot loader, or operating system will need to ensure the CPU goes into PRIV mode at the outset.

3. Disable the multitasking timer

It is good practice to disable the timer prior to preparing the user programs to run, because the procedure to prepare and start a user program goes into NPRIV mode several times, and depending on the timer’s setting and what initialization is done while NPRIV, an unexpected preemption might occur under unusual use cases. It wouldn’t happen in this example, but it could happen if the example is modified.

4. Prepare three user programs to run

The user programs in this example all have register goal to initialize prior to starting. We use the POKE instruction to copy the superuser’s (constant-valued) registers into the goal register for each of the three users. For each program instance being prepared, a USER instruction specifies which nonprivileged program’s call stack, registers, and (ordinarily but not here) page table are being prepared by the relevant Identity-modifying instructions. This example arbitrarily names users 25, 99, and 222, but they can be any three distinct values between 1 and 255 inclusive, and they need not be in ascending (or any particular) order.

We also call a subroutine named prepare.to.run.user, which I will discuss in the next major section. It only requires one parameter, which is the code address of the first instruction each user to execute. This is stored in a register named program, and we set it directly (we don’t POKE it), because it’s the superuser that will force each user to start at its specified code address.

Although we don’t have to copy the first instruction’s address across program boundaries, the register program is defined within the scope of the prepare.to.run.user subroutine. In essence, program is a local register to that subroutine, and that register is “kept” valid between calls to prepare.to.run.user so that it’s available from the outside to pass the parameter in.

The last little piece of this setup is that the assembler supports the notations :label and ::scope for assignment statements, allowing code memory addresses to be loaded into registers. This is how the address program gets saved in a register. This method only supports “low” addresses of less than 2¹⁸, which is sufficient for the architecture at this stage of its development.

5. Set the multitasking timer

We can now enable the multitasking timer to interrupt user programs every 20 instructions. A description of this is in Program initialization under TIMER. In a physical machine, we would expect and allow user programs to run for many more instructions (such as 65,535) per time slice. But in the electrical simulation, 20 instructions at a time is long enough for me to wait.

6. Run three user programs in turn forever

Up to now, not one instruction of any of the three user programs has been executed. The CPU is in PRIV (superuser) mode, the input parameters for each user is in their appropriate registers belonging to that user, the call stacks have been initialized (as I explain in the next major section), and each user’s starting address in code memory has been pushed on its call stack.

So now we see this forever loop that the superuser runs, where we switch to each of the three users before repeating the loop. The transition from superuser to user involves pushing the superuser’s instruction pointer and flags onto the superuser’s stack, then switching the CPU to NPRIV to access the user’s stack, registers, and (not applicable here) page table. This much is done by the NPCALL instruction, which amounts to a CALL followed by a NPRIV.

The NPCALL branches to a scope called resume.user.program containing the single instruction REVERT. This REVERT, which operates on the user’s call stack, restores the user’s last saved (or initial) instruction pointer and flags, and at this point the user program is running. The user program will continue to run until either the multitasking timer expires, or the user program executes a series YIELD instructions after its computation has finished.

With the REVERT, the superuser’s then-current instruction pointer and flags are irretrievably overwritten. In essence, the CPU went down a path and got to a place in the program that vanishes without a trace. When the superuser reawakens after the next timer expiration or YIELD, the superuser will be at the instruction immediately following its most-recent NPCALL with the superuser’s flags as they were.

User call stack preparation

The subroutine that follows is called once for each of the three user programs we are preparing to run. We already dealt with passing parameters to the user programs in the “main superuser code” above, but we need to set up the call stack for each. The two main steps are:

initialize the user program’s call stack (see CALI for explanation)
place the user program’s (code memory) start address on its call stack

After these two steps, the user program is ready to run, but has not started yet. Here is the code for these two steps:

; For the non-superuser (already selected by 'USER'), place the address
; 'program' on its stack. Don't really run the user yet.
prepare.to.run.user::
    unsigned program        ; input: starting address of program
    keep program

    ; Initialize the return address stack.
    setup
    cali stack.ready        ; This is a branch!

    ; Arriving here means either (a) the user underflowed its call stack,
    ; or (b) the user is branching here to purposely terminate. In a real
    ; operating system, the user should be removed from the schedule and
    ; its resources should be deallocated. But lacking a real operating
    ; system for this example, I've put an infinite loop of YIELD
    ; instructions so that other programs have an opportunity to run.
stop.user:
    nop
    yield
    jump stop.user

    ; When we get here, the return address stack has been initialized.
stack.ready:
    priv                    ; Get back out of SETUP mode.

    ; TEMPORARY MODIFICATION TO CODE MEMORY
    ; -------------------------------------
    ; Objective:   Need address of 'program' on user's return address stack.
    ; Limitation:  The only way to write 'program' onto this stack is to
    ;              execute a CALL instruction from location 'program' - 1.
    ; Solution:    A CALL instruction will be temporarily written at
    ;              'program' - 1, and we'll branch to it.

    ; First, we save the instruction we're about to overwrite with CALL.
    unsigned prior.place prior.inst
    prior.place = program - 1
    rcm1 prior.place
    rcm2 prior.inst

    ; Second, we need to create the 36-bit CALL instruction.
    ; It has the C (CALL) opcode and a branch address.
    unsigned instr
    unsigned address.to.call
    instr = opcode c
    address.to.call = :trampoline
    instr = instr | address.to.call

    ; Now write the CALL instruction at location 'program' - 1.
    wcm prior.place = instr

    ; Switch to user mode and jump to 'program' - 1.
    ; We'll need to write this address to a user register, and use
    ; JANY to jump there.
    poke prior.place = prior.place
    npriv
    jany prior.place

    ; YOU ARE NOW IN THE TWILIGHT ZONE.
    ; ---------------------------------
    ; More specifically, you've branched to 'program' - 1, which does not
    ; appear in this source code, where there is a CALL instruction to
    ; branch to the following line:

trampoline:
    priv

    ; Let's review what just happened. We reached here via a CALL instruction,
    ; and we're now in superuser (PRIV) mode. From the superuser's perspective,
    ; we didn't call here, we only branched here, because the CALL was done
    ; in user (NPRIV) mode and did not touch the superuser's return address
    ; stack. As for the user's stack, it now contains the starting address
    ; for the user program. Magic!

    ; RESTORATION OF CODE MEMORY
    ; --------------------------
    ; In the course of this mischief, we overwrote the instruction at
    ; address 'program' - 1, which likely didn't belong to us. Put it back
    ; as it used to be.
    wcm prior.place = prior.inst

    ; We have finished preparing this user program to run. This RETURN
    ; corresponds to "call prepare.to.run.user" above and marks the end
    ; of this scope.
    return

User program

What follows is the test program that is run in this preemptive multitasking example. Three instances of the same program are started. These instances produce different results, because they each have a different number of MIX instructions. Because preemptive multitasking allocates the CPU to each program for equal amounts of work, the simulation script knows the order the three instances will finish in and is able to individually validate the three different results produced.

Note that this program’s first instruction is a call to the glossary that follows the program. The glossary loads constant registers with their correct values. This need was discussed above under “load immediate values” for the main superuser code.

Here is the user program code:

((
    This is a short test program for preemptive multitasking. Using 0 as
    the key and initial value of 'word', it successively uses MIX (a
    substitution-permutation network instruction) to modify 'word'.

    This program does not terminate, because the superuser "scaffolding"
    around it doesn't maintain a process list and therefore doesn't have a
    way to allow this program to terminate. But it does have a 'goal' of
    how many MIX instructions to apply before verifying that 'word' has
    the correct value. When that goal is reached, a HALT is executed so
    the simulator script can check. Here are a few correct values of 'word'
    at various 'goal' settings:

        goal         word
        ----  -----------
          20  31358733809
         100  42533060964
         250  49274620803
        1000  19859066012

    This program is shorter than its cooperative multitasking counterparts,
    in part because preemption is unlikely to occur in lockstep with its
    loops. So the experiments we saw in the cooperative multitasking example
    don't really have natural places to insert them.
))
program::
    call glossary       ; initialize constant registers for this user

    unsigned goal       ; input: how many MIX operations to compute
    unsigned n          ; count of how many MIX operations so far
    unsigned word       ; current state latest MIX
    keep goal

    n = 0
    word = 0

loop:
    cmp goal - n
    jump == reached.goal
    word = word mix 0
    n = n + 1
    jump loop

reached.goal:
    ; Report the computation result back to the .ns script.
    word = word
    halt

((
    Unlike the cooperative multitasking example programs that continue
    calculating after the desired number of MIX instructions have executed,
    this program "shuts down." There isn't actually a way to terminate it,
    because the superuser isn't maintaining a list of currently-running
    programs that it can edit. So instead, we write an infinite loop of
    YIELD instructions so that this program uses as little CPU time
    relative to its peers as possible.

    If you're a YIELD guru and know its execution is delayed, you'll see
    below that YIELDs are "stacking up" in the sense that before one can
    suspend the program, another YIELD has been requested. I suspect the
    redundant YIELDs have no effect, and since there will be more to come
    after those, I see no harm here.
))
finished:
    yield
    jump finished

Glossary label

    ; This is the end of the source code. The assembler appends the
    ; glossary scope for constants here.
glossary::