Cooperative multitasking example
In cooperative multitasking, once a program has the CPU, it continues to have the CPU until it voluntarily releases it. This is done via the Dauug|36 YIELD
instruction.
Below is a regression test, written as a single Dauug|36 assembly language module, that implements cooperative multitasking and provides assurance it is working. It contains 116 instructions give or take, so I’ve broken it into sections where I add some description.
There are 14 different privileged instructions used by this code, so I recommend reading the following sections before or alongside this example:
- Privileged instructions
- Identity-modifying
- Program initialization
- Pseudo
- Unrestricted memory (Only
RCM1
,RCM2
, andWCM
apply for this.)
This code example causes four programs to execute simultaneously, but due to the limitations of the cross assembler and electrical simulation as of May 2023, they are written together as a single module and loaded into a contiguous block of code memory from offset 0. A superuser, numerically user 0 in the system, sets up its own call stack and register constants, disables the multitasking timer, and spawns three programs running under user 25, 99, and 222.
The three user programs each have their own call stack, registers, program counter, and flags. They are simple enough to only need registers; otherwise, they would also have their own page table and data memory. These programs do not yield the CPU on the same schedule, so they are in different places as they compute. They do, however, execute the same computation and will have the same result. The basic computation is done in an elaborate loop that counts steps, yields the CPU at various points, and adds some other tests, but the underlying numeric computation we are performing is this substitution-permutation network operation:
unsigned word word = 0 word = word mix 0 ; 20 copies of this line
As each user finishes its 20th MIX
instruction, the CPU is paused (via HALT
) so the test script can verify that word
is at its then-correct value of 31,358,733,809. The superuser (the “operating system” within this example) and three users are written to run forever; however, the simulation script (not shown here) will shut everything down after the third HALT
and verification finish.
It may seem like 20 instructions (plus a lot of loop overhead) per running program is a really short test for a multitasking CPU. I concede that it is, but this is only intended to be a regression test and coding example. This test case takes about 10 seconds to execute, and it’s just one of three dozen regression tests as of June 2023 that I run many times per day as I modify the CPU and firmware. So this test, like its 35 siblings, needs to be as short as it can without fully undermining its purpose.
The following segments of source code, when joined together in order, form this cooperative multitasking example.
Main superuser code
Execution begins with a block of superuser code that loops forever to feed CPU time to the three user programs. This code has five principal tasks:
- load immediate values (numeric constants) into registers
- initialize the superuser’s call stack
- disable the multitasking timer
- prepare three user programs to run
- run three user programs in turn forever
Here is the code under consideration, followed by an explanation of how it achieves the above five tasks.
; Evidence of cooperative multitasking (CM) ; MWA 26 May 2023 ff ; This initializes the superuser call stack. priv cali super.stack super.stack: ; Because this is cooperative multitasking (and other reasons), ; we turn off the multitasking timer here. unsigned timer.setting timer.setting = 10_0000_0000_0000_0000`b disable.timer: timer timer.setting jump >= disable.timer ; These next three blocks arrange for three user programs ; to run simultaneously using a variety of parameters. ; There are also two versions of the code being run. user 25 prepare.to.run.user::program = ::program.1 poke program.1::goal = 20 poke program.1::mask = 3`o poke program.1::prng = 25 call prepare.to.run.user user 99 prepare.to.run.user::program = ::program.2 poke program.2::goal = 20 poke program.2::mask = 7`o poke program.2::prng = 99 call prepare.to.run.user user 222 prepare.to.run.user::program = ::program.2 poke program.2::goal = 20 poke program.2::mask = 3`o poke program.2::prng = 222 call prepare.to.run.user ; This loop cycles between three users forever as they ; YIELD the CPU in sequence back to the superuser (this code). forever: user 25 npcall resume.user.program user 99 npcall resume.user.program user 222 npcall resume.user.program jump forever ; To continue running a user program, we pop its instruction ; pointer and flags while already in NPRIV mode. Note we can't ; just inline this REVERT to save an instruction, because the ; NPCALL is what preserves the superuser's instruction pointer. resume.user.program:: revert
1. Load immediate values
This program needs several integer constants, such as 20
, 99
, and 222
, but the architecture’s instruction format cannot accommodate these constants for most instructions directly. Instead, the assembler automatically appends to the executable code a callable routine called a “glossary” that moves each constant into a register that is dedicated for it. This is transparent to the programmer, so you won’t find any indication in the source code that this happens.
Because the glossary has no source code for you to see, here is its disassembler output instead, as well as a prepended CALL
to reach it:
0 c 94 94 imp 1 25 95 imp 2 20 96 imp 3 3 97 imp 4 99 98 imp 5 7 99 imp 6 222 100 imp 7 1 101 imh 8 131_072 102 imp 0 0 103 r 0 0 0
What the disassembler labeled as c
and r
are the CALL
and RETURN
opcodes. The remaining opcodes are ALU: immediate instructions. What’s happening is that a short subroutine is appended at code address 94 that places integer constants in registers 1 through 8, and then the frequently-used constant 0 is placed in register 0. The reason register 0 is filled last is that a temporary register is required to load most arbitrary 36-bit constants, and register 0 is held back for use as that temporary register until all the nonzero constants are finished. We don’t see this use as a temporary register in this example, because all of this example program’s constants can be encoded in 18 bits.
The glossary is separated out as a callable routine in order to support programs that need to load the register constants more than once. All that is required to load them again is to CALL
code address 94. We’ll need to do this at the start of all three user program instances, because although they use the same pool of constants, they have no access to them via the superuser’s registers.
2. Initialize the superuser’s call stack
CALL
, RETURN
, and REVERT
will not work correctly until the call stack is set up, so we see a CALI
instruction to provide this. It’s prefaced by a PRIV
in order to ensure we’re in superuser mode and are setting up the correct call stack. There are two subtleties to watch out for.
First, as you just read, the program already executed a CALL
and RETURN
to run the glossary code, but CALI
had not happened yet. It turns out this is okay, because what CALI
does is guarantee that the call stack has depth 255 instead of depth 1. We don’t know how deep the call stack could go when the glossary ran, but a depth of 1 would have been enough to handle the glossary.
Second, although we see a PRIV
in the assembly code before CALI
, the electrical simulation in fact forces the CPU into PRIV
mode every time it starts. On physical, non-simulated minicomputers, either the firmware loader, boot loader, or operating system will need to ensure the CPU goes into PRIV
mode at the outset.
3. Disable the multitasking timer
Because this example uses cooperative multitasking and will not pull the CPU away from a program until it YIELD
s, the multitasking timer’s 16-bit setpoint is filled with zeros so that the timer will never expire. It is good practice to disable the timer anyway, because the procedure to prepare and start a user program goes into NPRIV
mode several times, and depending on the timer’s setting and what initialization is done while NPRIV
, an unexpected preemption might occur under unusual use cases.
4. Prepare three user programs to run
The user programs in this example all have registers goal
, mask
, and prng
that need to be initialized with parameters prior to starting. We use the POKE
instruction to copy the superuser’s (constant-valued) registers into the three parameter registers for each of the three users. For each program instance being prepared, a USER
instruction specifies which nonprivileged program’s call stack, registers, and (ordinarily but not here) page table are being prepared by the relevant Identity-modifying instructions. This example arbitrarily names users 25, 99, and 222, but they can be any three distinct values between 1 and 255 inclusive, and they need not be in ascending (or any particular) order.
We also call a subroutine named prepare.to.run.user
, which I will discuss in the next major section. It only requires one parameter, which is the code address of the first instruction each user to execute. This is stored in a register named program
, and we set it directly (we don’t POKE
it), because it’s the superuser that will force each user to start at its specified code address.
Although we don’t have to copy the first instruction’s address across program boundaries, the register program
is defined within the scope of the prepare.to.run.user
subroutine. In essence, program
is a local register to that subroutine, and that register is “kept” valid between calls to prepare.to.run.user
so that it’s available from the outside to pass the parameter in.
The last little piece of this setup is that the assembler supports the notations :label
and ::scope
for assignment statements, allowing code memory addresses to be loaded into registers. This is how the addresses program.1
and program.2
get saved as registers. This method only supports “low” addresses of less than 218, which is sufficient for the architecture at this stage of its development.
5. Run three user programs in turn forever
Up to now, not one instruction of any of the three user programs has been executed. The CPU is in PRIV
(superuser) mode, the input parameters for each user is in their appropriate registers belonging to that user, the call stacks have been initialized (as I explain in the next major section), and each user’s starting address in code memory has been pushed on its call stack.
So now we see this forever
loop that the superuser runs, where we switch to each of the three users before repeating the loop. The transition from superuser to user involves pushing the superuser’s instruction pointer and flags onto the superuser’s stack, then switching the CPU to NPRIV
to access the user’s stack, registers, and (not applicable here) page table. This much is done by the NPCALL
instruction, which amounts to a CALL
followed by a NPRIV
.
The NPCALL
branches to a scope called resume.user.program
containing the single instruction REVERT
. This REVERT
, which operates on the user’s call stack, restores the user’s last saved (or initial) instruction pointer and flags, and at this point the user program is running. The user program will continue to run until either the multitasking timer expires (which we ensured will not occur in this example), or the user program executes a YIELD
instruction.
With the REVERT
, the superuser’s then-current instruction pointer and flags are irretrievably overwritten. In essence, the CPU went down a path and got to a place in the program that vanishes without a trace. When the superuser reawakens after the next YIELD
, the superuser will be at the instruction immediately following its most-recent NPCALL
with the superuser’s flags as they were.
User call stack preparation
The subroutine that follows is called once for each of the three user programs we are preparing to run. We already dealt with passing parameters to the user programs in the “main superuser code” above, but we need to set up the call stack for each. The two main steps are:
- initialize the user program’s call stack (see
CALI
for explanation) - place the user program’s (code memory) start address on its call stack
After these two steps, the user program is ready to run, but has not started yet. Here is the code for these two steps:
; For the non-superuser (already selected by 'USER'), place the address ; 'program' on its stack. Don't really run the user yet. prepare.to.run.user:: unsigned program ; input: starting address of program keep program ; Initialize the return address stack. ; The documentation for CALI describes how this works. setup cali stack.ready ; This is a branch! ; Arriving here means either (a) the user underflowed its call stack, ; or (b) the user is branching here to purposely terminate. In a real ; operating system, the user should be removed from the schedule and ; its resources should be deallocated. But lacking a real operating ; system for this example, I've put an infinite loop of YIELD ; instructions so that other programs have an opportunity to run. stop.user: nop yield jump stop.user ; When we get here, the return address stack has been initialized. stack.ready: priv ; Get back out of SETUP mode. ; TEMPORARY MODIFICATION TO CODE MEMORY ; ------------------------------------- ; Objective: Need address of 'program' on user's return address stack. ; Limitation: The only way to write 'program' onto this stack is to ; execute a CALL instruction from location 'program' - 1. ; Solution: A CALL instruction will be temporarily written at ; 'program' - 1, and we'll branch to it. ; First, we save the instruction we're about to overwrite with CALL. unsigned prior.place prior.inst prior.place = program - 1 rcm1 prior.place rcm2 prior.inst ; Second, we need to create the 36-bit CALL instruction. ; It has the C (CALL) opcode and a branch address. unsigned instr unsigned address.to.call instr = opcode c address.to.call = :trampoline instr = instr | address.to.call ; Now write the CALL instruction at location 'program' - 1. wcm prior.place = instr ; Switch to user mode and jump to 'program' - 1. ; We'll need to write this address to a user register, and use ; JANY to jump there. poke prior.place = prior.place npriv jany prior.place ; YOU ARE NOW IN THE TWILIGHT ZONE. ; --------------------------------- ; More specifically, you've branched to 'program' - 1, which does not ; appear in this source code, where there is a CALL instruction to ; branch to the following line: trampoline: priv ; Let's review what just happened. We reached here via a CALL instruction, ; and we're now in superuser (PRIV) mode. From the superuser's perspective, ; we didn't call here, we only branched here, because the CALL was done ; in user (NPRIV) mode and did not touch the superuser's return address ; stack. As for the user's stack, it now contains the starting address ; for the user program. Magic! ; RESTORATION OF CODE MEMORY ; -------------------------- ; In the course of this mischief, we overwrote the instruction at ; address 'program' - 1, which likely didn't belong to us. Put it back ; as it used to be. wcm prior.place = prior.inst ; We have finished preparing this user program to run. This RETURN ; corresponds to "call prepare.to.run.user" above and marks the end ; of this scope. return
User program 1
What follows is the first of two test programs that are run in this cooperative multitasking example. Only one instance of this program is started; two instances of the second test program are executed at the same time.
Because the three instances YIELD
at difficult-to-predict times based on their seed and mask settings, I can’t dependably predict the order they will finish in without a lengthy test that makes the order (usually) predictable. For this reason, the three instances compute the same result so that the surrounding test script isn’t affected by the order they finish in. Unfortunately, this means that if two scripts never output a result for some reason, the third script can “loop around” the 36-bit counter and supply correct results for them. Why don’t I fix this? Because the electrical simulation is so slow that the time required to produce a false “pass” in this manner would measure in decades to centuries.
Note that this program’s first instruction is a call to the glossary that follows the program. The glossary loads constant registers with their correct values. This need was discussed above under “load immediate values” for the main superuser code.
Here is the code for user program 1:
(( This is a short test program for cooperative multitasking. Using 0 as the key and initial value of 'word', it successively uses MIX (a substitution-permutation network instruction) to modify 'word'. This program does not terminate, because the superuser "scaffolding" around it doesn't maintain a process list and therefore doesn't have a way to allow this program to terminate. But it does have a 'goal' of how many MIX instructions to apply before verifying that 'word' has the correct value. When that goal is reached, a HALT is executed so the simulator script can check. Here are a few correct values of 'word' at various 'goal' settings: goal word ---- ----------- 20 31358733809 100 42533060964 250 49274620803 1000 19859066012 This program has two complications added: 1. It periodically YIELDs the CPU, based on a faux pseudorandom number generator implemented using XIM (another substitution- permutation network instruction). By "faux," what I mean is that this PRNG uses XIM all by itself to create a stream of numbers, which gives us NO ASSURANCE AT ALL as to the generator's period. So don't cut-paste this code into anything else. For a description of a much better PRNG, see the dissertation at pages 127-131. WARNING: This XIM-only PRNG is *so horrible* that there might be seed values that never cause the CPU to yield, causing this regression test to catastrophically fail. In any event, provided an initial seed named 'prng' and a bitmask 'mask'. Whenever all bits identified by 'mask' are zeros in 'prng', the program will YIELD. So for various values of 'mask', the CPU will yield: mask probability of CPU yielding ------ --------------------------- 1`b about 50% of the time 11`b about 25% of the time 111`b about 12.5% of the time 1111`b about 6.25% of the time 2. I stuck some "unnecessary" JUMP instructions in the vicinity of the YIELD to explore for incompatibilities between JUMP and YIELD. Test coverage either wasn't thought out or wasn't adequately documented, although no problems were anticipated or found. An improper jump would likely HALT with 'word' at an unexpected value, causing the simulator script to indicate a mistake. I would explain here why I think YIELD is compatible with the entire instruction set (except XANY), but it would be hand-waving, which is probably worse than not offering an explanation yet. )) program.1:: call glossary ; Initialize constant registers for this user. unsigned goal ; input: how many MIX operations to compute unsigned mask ; input: yield CPU when PRNG and mask == 0 unsigned prng ; input: faux PRNG (no LFSR), must seed this register keep goal mask prng unsigned maybe ; Only the Z(ero) flag from this register is used. unsigned n ; count of how many MIX operations so far unsigned word ; current state latest MIX n = 0 word = 0 loop: ; At random times, branch around the following YIELD. prng = prng xim 0 maybe = prng and mask jump != do.not.yield ; If we didn't branch, request that the CPU yield. ; Two more instructions will be executed prior to the actual yield. yield ; Try to break the YIELD with JUMP instructions and vice versa. jump with.me.1 halt with.me.1: jump with.me.2 halt with.me.2: ; Whether we yielded or not, see if we reached our goal for ; how many MIX instructions to do. do.not.yield: cmp goal - n ; Compare n with goal. jump != not.at.goal ; We're at our goal, so report the current 'word' back to the .ns script. ; After checking if 'word' is correct, the script will resume the CPU. word = word halt ; Whether or not that was our goal, do the next MIX instruction in our ; sequence, and increment the count of how many MIXes have been done. not.at.goal: word = word mix 0 n = n + 1 jump loop
User program 2
Here is the code for the remaining test program for this cooperative multitasking example:
(( This is a modified version of 'program.1'. It does exactly the same computation, but rather than add JUMP instructions in the vicinity of YIELD, it sets or clears the Z(ero) flag depending on the present PRNG state, and tests to ensure the Z flag, whether it's 0 or 1, is restored to its correct value when the program resumes after being suspended. Because the N(egative), Z(ero), T(emporal), and R(ange) flags are electrical clones in terms of their handling, I infer if Z(ero) is saved and restored as anticipated, it's probable that the remaining three flags are also saved and restored as designed. See the comments above 'program.1' for the base design of this program. )) program.2:: call glossary ; Initialize constant registers for this user. unsigned goal ; input: how many MIX operations to compute unsigned mask ; input: yield CPU when PRNG and mask == 0 unsigned prng ; input: faux PRNG (no LFSR), must seed this register keep goal mask prng unsigned maybe ; Only the Z(ero) flag from this register is used. unsigned n ; count of how many MIX operations so far unsigned word ; current state latest MIX unsigned z.have ; Bit 35 is cleared iff the Z(ero) flag is set. unsigned z.want ; value for z.have in the event of correct operation n = 0 word = 0 loop: ; At random times, branch around the following YIELD. prng = prng xim 0 maybe = prng and mask jump != do.not.yield ; If we didn't branch, request that the CPU yield. ; Two more instructions will be executed prior to the actual yield. yield ; This was already fetched. Any instruction except XANY works fine here. nop ; Before yielding to the superuser, arbitrarily set the Z(ero) flag. z.want = prng and 400000_000000`o ; ----- THIS IS WHERE THE YIELD ACTUALLY SUSPENDS EXECUTION ; Find out if Z(ero) changed (very bad!) while the program was asleep. jump == is.zero.now z.have = 400000_000000`o jump reckon is.zero.now: z.have = 0 reckon: cmp z.have - z.want jump == is.good flag.test.failed: word = 0 halt ; Send an incorrect result to the test script. jump flag.test.failed is.good: ; Whether we yielded or not, see if we reached our goal for ; how many MIX instructions to do. do.not.yield: cmp goal - n ; Compare n with goal. jump != not.at.goal ; We're at our goal, so report the current 'word' back to the .ns script. ; After checking if 'word' is correct, the script will resume the CPU. word = word halt ; Whether or not that was our goal, do the next MIX instruction in our ; sequence, and increment the count of how many MIXes have been done. not.at.goal: word = word mix 0 n = n + 1 jump loop
Glossary label
; This is the end of the source code. The assembler appends the ; glossary scope for constants here. glossary::