Preemptive multitasking example
In preemptive multitasking, running programs don’t need to do anything to give up the CPU so that other programs can run. Instead, the hardware and operating system will take the CPU back without permission or assistance from the program that was running. This is arranged for by a hardware timer and the Dauug|36 TIMER
instruction.
Below is a regression test, written as a single Dauug|36 assembly language module, that implements preemptive multitasking and provides assurance it is working. It contains 75 instructions give or take, so I’ve broken it into sections where I add some description.
There are 14 different privileged instructions used by this code, so I recommend reading the following sections before or alongside this example:
- Privileged instructions
- Identity-modifying
- Program initialization
- Pseudo
- Unrestricted memory (Only
RCM1
,RCM2
, andWCM
apply for this.)
This code example causes four programs to execute simultaneously, but due to the limitations of the cross assembler and electrical simulation as of May 2023, they are written together as a single module and loaded into a contiguous block of code memory from offset 0. A superuser, numerically user 0 in the system, sets up its own call stack and register constants, disables the multitasking timer, sets up three programs to run as user 25, 99, and 222, configures the multitasking timer, and starts the three programs.
The three user programs each have their own call stack, registers, program counter, and flags. They are simple enough to only need registers; otherwise, they would also have their own page table and data memory. The three programs execute the same computation, although to different finishing points. Each program produce a different numeric result. The basic computation is done in a 5-instruction loop that does the following substitution-permutation network operation:
unsigned word word = 0 word = word mix 0 ; 20 copies of this line
As each user finishes its assigned number of MIX
instructions, the CPU is paused (via HALT
) so the test script can verify that word
is at its then-correct value for that point in the sequence. The superuser (the “operating system” within this example) and three users are written to run forever; however, the simulation script (not shown here) will shut everything down after the third HALT
and verification finish.
It may seem like 20 instructions (plus loop overhead) per running program is a really short test for a multitasking CPU. I concede that it is, but this is only intended to be a regression test and coding example. This test case takes about 5 seconds to execute, and it’s just one of three dozen regression tests as of June 2023 that I run many times per day as I modify the CPU and firmware. So this test, like its 35 siblings, needs to be as short as it can without fully undermining its purpose.
The following segments of source code, when joined together in order, form this preemptive multitasking example.
Main superuser code
Execution begins with a block of superuser code that loops forever to feed CPU time to the three user programs. This code has five principal tasks:
- load immediate values (numeric constants) into registers
- initialize the superuser’s call stack
- disable the multitasking timer
- prepare three user programs to run
- set the multitasking timer
- run three user programs in turn forever
Here is the code under consideration, followed by an explanation of how it achieves the above six tasks.
; Evidence of preemptive multitasking (PM) ; MWA 26 May 2023 ff ; This initializes the superuser call stack. priv cali super.stack super.stack: (( We temporarily turn off the preemption timer for multitasking here, because the superuser needs to enter NPRIV mode a few times while preparing everything to run. Although these NPRIV intervals are brief and the multitasking timer would ordinarily be set for thousands of instructions, disabling the timer will reduce opportunity for surprises in the event this program is modified. )) unsigned timer.setting timer.setting = 10_0000_0000_0000_0000`b disable.timer: timer timer.setting jump >= disable.timer (( These next three blocks arrange for three user programs to run simultaneously using a variety of parameters. There is just one version of the program being run. )) user 25 prepare.to.run.user::program = ::program poke program::goal = 30 call prepare.to.run.user user 99 prepare.to.run.user::program = ::program poke program::goal = 20 call prepare.to.run.user user 222 prepare.to.run.user::program = ::program poke program::goal = 40 call prepare.to.run.user (( Now we can enable the multitasking timer. In this example, we set the 16-bit LFSR to 0111110001010010`b, which is 19 values prior to 1111111111111111`b where the timer expires. This setting will permit each program to execute 20 instructions prior to each preemption. )) timer.setting = 10_0111110001010010`b enable.timer: timer timer.setting jump >= enable.timer ; This loop cycles between three users forever as they ; YIELD the CPU in sequence back to the superuser (this code). forever: user 25 npcall resume.user.program user 99 npcall resume.user.program user 222 npcall resume.user.program jump forever ; To continue running a user program, we just pop its instruction ; pointer and flags while already in NPRIV mode. resume.user.program:: revert
1. Load immediate values
This program needs several integer constants, such as 20
, 99
, and 222
, but the architecture’s instruction format cannot accommodate these constants for most instructions directly. Instead, the assembler automatically appends to the executable code a callable routine called a “glossary” that moves each constant into a register that is dedicated for it. This is transparent to the programmer, so you won’t find any indication in the source code that this happens.
Because the glossary has no source code for you to see, here is its disassembler output instead, as well as a prepended CALL
to reach it:
0 c 60 60 imp 1 25 61 imp 2 30 62 imp 3 99 63 imp 4 20 64 imp 5 222 65 imp 6 40 66 imp 7 1 67 imp 0 0 68 r 0 0 0
What the disassembler labeled as c
and r
are the CALL
and RETURN
opcodes. The remaining opcodes are ALU: immediate instructions. What’s happening is that a short subroutine is appended at code address 60 that places integer constants in registers 1 through 7, and then the frequently-used constant 0 is placed in register 0. The reason register 0 is filled last is that a temporary register is required to load most arbitrary 36-bit constants, and register 0 is held back for use as that temporary register until all the nonzero constants are finished. We don’t see this use as a temporary register in this example, because all of this example program’s constants can be encoded in 18 bits.
The glossary is separated out as a callable routine in order to support programs that need to load the register constants more than once. All that is required to load them again is to CALL
code address 60. We’ll need to do this at the start of all three user program instances, because although they use the same pool of constants, they have no access to them via the superuser’s registers.
2. Initialize the superuser’s call stack
CALL
, RETURN
, and REVERT
will not work correctly until the call stack is set up, so we see a CALI
instruction to provide this. It’s prefaced by a PRIV
in order to ensure we’re in superuser mode and are setting up the correct call stack. There are two subtleties to watch out for.
First, as you just read, the program already executed a CALL
and RETURN
to run the glossary code, but CALI
had not happened yet. It turns out this is okay, because what CALI
does is guarantee that the call stack has depth 255 instead of depth 1. We don’t know how deep the call stack could go when the glossary ran, but a depth of 1 would have been enough to handle the glossary.
Second, although we see a PRIV
in the assembly code before CALI
, the electrical simulation in fact forces the CPU into PRIV
mode every time it starts. On physical, non-simulated minicomputers, either the firmware loader, boot loader, or operating system will need to ensure the CPU goes into PRIV
mode at the outset.
3. Disable the multitasking timer
It is good practice to disable the timer prior to preparing the user programs to run, because the procedure to prepare and start a user program goes into NPRIV
mode several times, and depending on the timer’s setting and what initialization is done while NPRIV
, an unexpected preemption might occur under unusual use cases. It wouldn’t happen in this example, but it could happen if the example is modified.
4. Prepare three user programs to run
The user programs in this example all have register goal
to initialize prior to starting. We use the POKE
instruction to copy the superuser’s (constant-valued) registers into the goal
register for each of the three users. For each program instance being prepared, a USER
instruction specifies which nonprivileged program’s call stack, registers, and (ordinarily but not here) page table are being prepared by the relevant Identity-modifying instructions. This example arbitrarily names users 25, 99, and 222, but they can be any three distinct values between 1 and 255 inclusive, and they need not be in ascending (or any particular) order.
We also call a subroutine named prepare.to.run.user
, which I will discuss in the next major section. It only requires one parameter, which is the code address of the first instruction each user to execute. This is stored in a register named program
, and we set it directly (we don’t POKE
it), because it’s the superuser that will force each user to start at its specified code address.
Although we don’t have to copy the first instruction’s address across program boundaries, the register program
is defined within the scope of the prepare.to.run.user
subroutine. In essence, program
is a local register to that subroutine, and that register is “kept” valid between calls to prepare.to.run.user
so that it’s available from the outside to pass the parameter in.
The last little piece of this setup is that the assembler supports the notations :label
and ::scope
for assignment statements, allowing code memory addresses to be loaded into registers. This is how the address program
gets saved in a register. This method only supports “low” addresses of less than 218, which is sufficient for the architecture at this stage of its development.
5. Set the multitasking timer
We can now enable the multitasking timer to interrupt user programs every 20 instructions. A description of this is in Program initialization under TIMER
. In a physical machine, we would expect and allow user programs to run for many more instructions (such as 65,535) per time slice. But in the electrical simulation, 20 instructions at a time is long enough for me to wait.
6. Run three user programs in turn forever
Up to now, not one instruction of any of the three user programs has been executed. The CPU is in PRIV
(superuser) mode, the input parameters for each user is in their appropriate registers belonging to that user, the call stacks have been initialized (as I explain in the next major section), and each user’s starting address in code memory has been pushed on its call stack.
So now we see this forever
loop that the superuser runs, where we switch to each of the three users before repeating the loop. The transition from superuser to user involves pushing the superuser’s instruction pointer and flags onto the superuser’s stack, then switching the CPU to NPRIV
to access the user’s stack, registers, and (not applicable here) page table. This much is done by the NPCALL
instruction, which amounts to a CALL
followed by a NPRIV
.
The NPCALL
branches to a scope called resume.user.program
containing the single instruction REVERT
. This REVERT
, which operates on the user’s call stack, restores the user’s last saved (or initial) instruction pointer and flags, and at this point the user program is running. The user program will continue to run until either the multitasking timer expires, or the user program executes a series YIELD
instructions after its computation has finished.
With the REVERT
, the superuser’s then-current instruction pointer and flags are irretrievably overwritten. In essence, the CPU went down a path and got to a place in the program that vanishes without a trace. When the superuser reawakens after the next timer expiration or YIELD
, the superuser will be at the instruction immediately following its most-recent NPCALL
with the superuser’s flags as they were.
User call stack preparation
The subroutine that follows is called once for each of the three user programs we are preparing to run. We already dealt with passing parameters to the user programs in the “main superuser code” above, but we need to set up the call stack for each. The two main steps are:
- initialize the user program’s call stack (see
CALI
for explanation) - place the user program’s (code memory) start address on its call stack
After these two steps, the user program is ready to run, but has not started yet. Here is the code for these two steps:
; For the non-superuser (already selected by 'USER'), place the address ; 'program' on its stack. Don't really run the user yet. prepare.to.run.user:: unsigned program ; input: starting address of program keep program ; Initialize the return address stack. setup cali stack.ready ; This is a branch! ; Arriving here means either (a) the user underflowed its call stack, ; or (b) the user is branching here to purposely terminate. In a real ; operating system, the user should be removed from the schedule and ; its resources should be deallocated. But lacking a real operating ; system for this example, I've put an infinite loop of YIELD ; instructions so that other programs have an opportunity to run. stop.user: nop yield jump stop.user ; When we get here, the return address stack has been initialized. stack.ready: priv ; Get back out of SETUP mode. ; TEMPORARY MODIFICATION TO CODE MEMORY ; ------------------------------------- ; Objective: Need address of 'program' on user's return address stack. ; Limitation: The only way to write 'program' onto this stack is to ; execute a CALL instruction from location 'program' - 1. ; Solution: A CALL instruction will be temporarily written at ; 'program' - 1, and we'll branch to it. ; First, we save the instruction we're about to overwrite with CALL. unsigned prior.place prior.inst prior.place = program - 1 rcm1 prior.place rcm2 prior.inst ; Second, we need to create the 36-bit CALL instruction. ; It has the C (CALL) opcode and a branch address. unsigned instr unsigned address.to.call instr = opcode c address.to.call = :trampoline instr = instr | address.to.call ; Now write the CALL instruction at location 'program' - 1. wcm prior.place = instr ; Switch to user mode and jump to 'program' - 1. ; We'll need to write this address to a user register, and use ; JANY to jump there. poke prior.place = prior.place npriv jany prior.place ; YOU ARE NOW IN THE TWILIGHT ZONE. ; --------------------------------- ; More specifically, you've branched to 'program' - 1, which does not ; appear in this source code, where there is a CALL instruction to ; branch to the following line: trampoline: priv ; Let's review what just happened. We reached here via a CALL instruction, ; and we're now in superuser (PRIV) mode. From the superuser's perspective, ; we didn't call here, we only branched here, because the CALL was done ; in user (NPRIV) mode and did not touch the superuser's return address ; stack. As for the user's stack, it now contains the starting address ; for the user program. Magic! ; RESTORATION OF CODE MEMORY ; -------------------------- ; In the course of this mischief, we overwrote the instruction at ; address 'program' - 1, which likely didn't belong to us. Put it back ; as it used to be. wcm prior.place = prior.inst ; We have finished preparing this user program to run. This RETURN ; corresponds to "call prepare.to.run.user" above and marks the end ; of this scope. return
User program
What follows is the test program that is run in this preemptive multitasking example. Three instances of the same program are started. These instances produce different results, because they each have a different number of MIX
instructions. Because preemptive multitasking allocates the CPU to each program for equal amounts of work, the simulation script knows the order the three instances will finish in and is able to individually validate the three different results produced.
Note that this program’s first instruction is a call to the glossary that follows the program. The glossary loads constant registers with their correct values. This need was discussed above under “load immediate values” for the main superuser code.
Here is the user program code:
(( This is a short test program for preemptive multitasking. Using 0 as the key and initial value of 'word', it successively uses MIX (a substitution-permutation network instruction) to modify 'word'. This program does not terminate, because the superuser "scaffolding" around it doesn't maintain a process list and therefore doesn't have a way to allow this program to terminate. But it does have a 'goal' of how many MIX instructions to apply before verifying that 'word' has the correct value. When that goal is reached, a HALT is executed so the simulator script can check. Here are a few correct values of 'word' at various 'goal' settings: goal word ---- ----------- 20 31358733809 100 42533060964 250 49274620803 1000 19859066012 This program is shorter than its cooperative multitasking counterparts, in part because preemption is unlikely to occur in lockstep with its loops. So the experiments we saw in the cooperative multitasking example don't really have natural places to insert them. )) program:: call glossary ; initialize constant registers for this user unsigned goal ; input: how many MIX operations to compute unsigned n ; count of how many MIX operations so far unsigned word ; current state latest MIX keep goal n = 0 word = 0 loop: cmp goal - n jump == reached.goal word = word mix 0 n = n + 1 jump loop reached.goal: ; Report the computation result back to the .ns script. word = word halt (( Unlike the cooperative multitasking example programs that continue calculating after the desired number of MIX instructions have executed, this program "shuts down." There isn't actually a way to terminate it, because the superuser isn't maintaining a list of currently-running programs that it can edit. So instead, we write an infinite loop of YIELD instructions so that this program uses as little CPU time relative to its peers as possible. If you're a YIELD guru and know its execution is delayed, you'll see below that YIELDs are "stacking up" in the sense that before one can suspend the program, another YIELD has been requested. I suspect the redundant YIELDs have no effect, and since there will be more to come after those, I see no harm here. )) finished: yield jump finished
Glossary label
; This is the end of the source code. The assembler appends the ; glossary scope for constants here. glossary::