Unrestricted memory instructions (privileged)
Opcode | P/U | Category | Description |
ADDRDM |
priv | unrestricted memory | add then read data memory |
RCM1 |
priv | unrestricted memory | read code memory part 1 |
RCM2 |
priv | unrestricted memory | read code memory part 2 |
RDM |
priv | unrestricted memory | read data memory |
RPT |
priv | unrestricted memory | read page table |
RWDM |
priv | unrestricted memory | read and write data memory |
WCM |
priv | unrestricted memory | write code memory |
WDM |
priv | unrestricted memory | write data memory |
WDM2 |
priv | unrestricted memory | write data memory twice |
WPT |
priv | unrestricted memory | write page table |
Preface
As mentioned earlier, of the ten unrestricted memory instructions, five are simply privileged versions of nonprivileged instructions. That is, their privileged version (left column) bypasses the page table, but otherwise does what their nonprivileged counterpart (middle column) does.
Opcode | Acts like | Description |
ADDRDM |
ADDLD |
add and read data memory* |
RDM |
LD |
read data memory |
RWDM |
LDSTO |
read and write data memory |
WDM |
STO |
write data memory |
WDM2 |
STO2 |
write data memory twice |
*Minor differences exist due to page table side effects.
The five remaining memory instructions read and write the code memory and the page table. These have no nonprivileged counterparts, because user programs have no access to these memory types.
Opcode | Description |
RCM1 |
read code memory 1 |
RCM2 |
read code memory 2 |
RPT |
read page table |
WCM |
write code memory |
WPT |
write page table |
(Due to timing constraints and high net contention, reading an instruction from code memory requires an uninterrupted two-instruction sequence.)
ADDRDM
Add then read data memory
Syntax |
dest = base addrdm offset |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
ADDRDM
(add then read data memory) approximates a base + offset scheme for reading from data memory, except the addition is somewhat broken. The advantage to using ADDRDM
is that for cases where the addition quirk is known to be harmless, a base + offset read from memory can be done in a single instruction instead of two.
ADDRDM
adds the tribbles of offset
pairwise with the tribbles of base
with wraparound, meaning there is no carry into bits 6, 12, 18, 24, or 30. Wraparound occurs because the alpha RAMs doing the addition act simultaneously and cannot intercommunicate. The six 6-bit sums form a physical address where a word is fetched from data memory. This result is written to register dest
. The N
and Z
flags are set as if dest
is a signed register. T
and R
do not change.
Unlike with ADDLD
, the base
and offset
registers for ADDRDM
are interchangeable, because the page table does not participate.
Safety for ADDRDM
comes via the use of “ADDRDM
-compatible” pointers, which are pointers to structures that either (i) do not cross 64-word boundaries, or (ii) are aligned on and fully within a power-of-two boundary. Unlike ADDLD
, which is masked by the page table from bit 12 onward, any power-of-two segment size works with ADDRDM
. The convenience of ADDRDM
’s safety is that compatibility is established when memory for structures is allocated, rather than when ADDRDM
is used. Thus all that is necessary to make a pointer ADDRDM
-compatible is to use an ADDRDM
-compatible allocator. It is sufficient but not necessary to use an ADDLD
-compatible allocator, because ADDLD
-compatible pointers are more restricted than ADDRDM
-compatible pointers.
Because ADDRDM
-compatible allocators fragment the free memory pool to satisfy alignment constraints, choosing ADDRDM
can increase a program’s data memory consumption. For block sizes of 1 through 10 words, the overhead is less than 7% when all blocks are of the same size. Some block sizes, such as 33 words, will have overhead approaching 100%, although this overhead could be reclaimed in part by partitioning free blocks according to their size.
Why there is no ADDWDM
instruction
Dauug|36 supports a maximum of two operands per instruction, but ADDWDM
(add then write data memory) would require three operands—base, offset, and a word to write. This exceeds what the architecture can do in one instruction.
RCM1
Read code memory part 1
Syntax |
rcm1 addr |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
The two-instruction RCM
sequence fetches a word from code memory. This is in a very congested area of the architecture’s cycle timing, and there is a great distance to travel through the pipeline. For this reason, code memory reads must be broken into a contiguous two-instruction sequence.
RCM1
reads from the code memory address stored in register addr
, and leaves the value read in ff m
, a 36-bit-wide flip-flop that is ordinarily used to move immediate values from code memory to registers. Any usage of ff m
at this point will overwrite the value read from code memory, so it’s important that RCM2
be the next instruction executed. This can only be assured if the multitasking timer is disabled.
Conflicts
All instructions—except any instruction that immediately follows RCM1
—are copied into ff m
as they are fetched, in order that immediate values can be extracted. Due to timing constraints, this copy must occur prior to the instruction being decoded, so it is not known when ff m
is loaded whether its contents will be required. There are two consequences to this.
First, if RCM1
and RCM2
are separated by even a single instruction (other than another RCM1
given the same address), the code memory will not be read successfully.
Second, any immediate instruction that immediately follows RCM1
will fail. Here is an example:
rcm1 addr x = 0 rcm2 inst
After this code executes, inst
is equal to 0, and x
contains the lower 18 bits of the instruction at code address addr
. The upper 18 bits of x
are 0.
RCM2
Read code memory part 2
Syntax |
rcm2 dest |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
The two-instruction RCM
sequence fetches a word from code memory. This is in a very congested area of the architecture’s cycle timing, and there is a great distance to travel through the pipeline. For this reason, code memory reads must be broken into a contiguous two-instruction sequence.
RCM2
copies the 36-bit word in ff m
, a 36-bit-wide flip-flop that is ordinarily used to move immediate values from code memory to registers, to register dest
. The N
and Z
flags are set as if dest
is a signed register. T
and R
do not change.
When RCM2
immediately follows RCM1
with no possibility of interruption between, this two-instruction sequence reliably reads words from code memory. Here is how this could look:
unsigned addr inst addr = 123456 ; Disable the multitasking timer. This has further implications. priv ; Now it's safe to read. rcm1 addr rcm2 inst ; 'inst' now has a copy of the instruction at code address 'addr'.
See also Cooperative multitasking or Preemptive multitasking for an actual application of RCM1
, RCM2
, and WCM
.
RDM
Read data memory
Syntax |
dest = rdm addr |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
RDM
(read data memory) fetches a word from physical data memory address addr
and stores the result in register dest
. The N
and Z
flags are set as if dest
is a signed register. T
and R
do not change.
RPT
Read page table
Syntax |
phys = rpt virt |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
RPT
(write page table) queries the base address of the physical memory block that is mapped to the virtual memory block base address virt
. The result is written to register phys
. Because the block size is 4096 words, the 12 least significant bits of virt
are ignored and should in principle be zeros, and the 12 least significant bits of phys
will definitely be zeros.
Note that in this architecture, every virtual address block electrically maps to a physical address block whether intentional or not. This is another way of saying that every LD
or STO
instruction will access physical memory somewhere, so the operating system needs to make sure that somewhere is permissible for the user in question.
The user whose page table is queried by RPT
depends on the eight user bits that are presented to the page table RAM when the instruction is executed, which are controlled by several Identity-modifying instructions. Proper use would generally be by the superuser in SETUP
mode like this:
setup user 105 phys_block = rpt 0003_0000`o priv
In this example, the physical address for user 105’s virtual page 3 is being looked up. The SETUP
causes the user’s page table to be read instead of the superuser’s, but we have to drop out of setup mode using PRIV
right away, because SETUP
also switches the call stack to the user’s, and a CALL
, RETURN
, or REVERT
would taint it.
The page table’s width—the number of bits in phys_block
that actually mean something—can vary between machines; however, good electrical design requires that all 36 output bits actually be driven by something, even though page table RAM can’t offer enough outputs to fill everything. As of 19 June 2023, the 12 least significant bits come from the alpha RAMs and will be zero, and 18 bits come from the page table. This leaves 6 bits that come from Somewhere Else, perhaps a flip-flop somewhere, and their value has no relation to the page table and may vary from time to time or machine to machine. Note, too, that these six “martian” bits are not tribble-aligned, because bits 34 and 35 of all physical addresses have special meaning. The bottom line is, the operating system must probe for the page table dimensions when the system starts, and interpret information provided by RPT
within the context of those dimensions. This can be as simple as having the operating system compute a mask to zero any uninvolved bits:
setup user 105 phys_block = rpt 0003_0000`o phys_block = phys_block & RPT_MASK_FROM_OS priv
As of 19 June 2023, the netlist in current testing would result in `RPT_MASK_FROM_OS` being 6017_7777_0000`o
. This is to say, bits 28–33 are uninvolved and would likely contain spurious data, and bits 0–11 represent offset bits that should read as zero even without masking.
RWDM
Read and write data memory
Syntax |
dest = addr rwdm tval |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
RWDM
(read and write data memory) atomically fetches the word from physical data memory address addr
and stores the result in register dest
, while storing the transposed value of register tval
to the same physical address addr
. The N
and Z
flags are set as if dest
is a signed register. T
and R
do not change.
RWDM
effectively rotates tval
into memory with transposition, and what was in memory rotates to dest
. It is permissible to use the same register for dest
and tval
, in which case RWDM
becomes a register-memory swap with transposition. (Technically addr
could also use the same register, although that scenario seems unlikely to me.)
The point of RWDM
is not to save time, but to provide an atomic operation that can implement semaphores in shared memory. Transposition of the right operand is electrically unavoidable due to the instruction being limited to four CPU cycles, but it doesn’t matter much. In a simple semaphore, the right operand would be 0
or 1
, which will not change value when transposed. In cases where transposition matters, TXOR
can be added to write the intended value this way:
tval = 0 txor val dest = addr rwdm tval
TXOR
’s presence as a separate instruction does not break the atomicity of the RWDM
.
Code to obtain a lock on a semaphore by atomically writing a 1
over a 0
at location addr
would look like the following. There is no problem writing a 1
over someone else’s 1
, but RWDM
will let us know this happened so we don’t claim the semaphore.
waiting: tval = addr rwdm 1 jump == ready ; If tval == 0, we obtained the semaphore. yield nop ; YIELD does not take effect immediately. jump waiting ; Needn't replace other user's 1 with our 1. ready: ; Critical section goes here. done: ; When it's time to release the semaphore, write 0 to addr. ; Since the semaphore is ours, we don't need to read what it was. sto addr = 0
Do not use RWDM
on write-protected memory locations, because your program will leave the semaphore unlocked while proceeding as if the lock was acquired.
WCM
Write code memory
Syntax |
wcm addr = inst |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
WCM
(write code memory) writes the instruction in register inst
into code memory at address addr
. No flags are affected.
WCM
and security
WCM
is key gatekeeper to separation between programs, because there are only two paths to executing a privileged instruction. The first and principal path is for WCM
to write a privileged instruction to code memory, where it can be fetched and executed. Any instruction WCM
does not write into code memory will not be there to be fetched.
A secondary path to executing a privileged instruction is to fetch one from a register using the XANY
instruction. This, however, can only occur if at some point WCM
writes an XANY
instruction to code memory to be executed. So in this case too, WCM
is the gatekeeper for separation.
More about WCM
and security
You may worry that once a privileged instruction is in code memory, any program can execute it. This isn’t the case, because instructions in code memory are dormant until they are fetched. For a user to fetch an instruction, the user’s instruction pointer must contain the address of the instruction, and this address can only come from JUMP
and CALL
instructions, which live in—you guessed it—code memory. So here too, WCM
is used as a gatekeeper to keep code addresses that user programs shouldn’t have access too safely out of user programs.
(The above is almost the whole truth. The instruction pointer can also come from the instruction pointer incrementer, the JANY
privileged instruction, and the firmware loader when the system first boots. Operating systems can easily account for these cases and preclude problems from occurring, here again exerting control by supervising what WCM
writes into the code memory.)
WDM
Write data memory
Syntax |
wdm dest = val |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
WDM
(write data memory) copies the data in register val
to data memory at the physical address contained in register dest
. No flags are modified.
It is not an error if dest
is a write-protected physical address; however, in this situation WDM
will have no effect. (If you’re curious about write protection, every word of Dauug|36 physical data memory is accessible at two addresses that differ only at bit 35. The address with bit 35 set can be read from, but not written to. Privileged programs can easily overcome write protection by clearing bit 35, but user programs are stuck with how bit 35 is set in the page table.)
WDM2
Write data memory twice
Syntax |
wdm2 dests = tval |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
WDM2
(write data memory twice) transposes and stores the word in register tval
to two memory locations determined by dests
, based on the following table. No flags are changed.
dests mod 4 |
addresses written to |
0 | dests, dests + 1 |
1 | dests, dests + 1 |
2 | dests, dests + 1 |
3 | dests, dests − 3 |
The reason this instruction cycles addresses modulo 4 is that WDM2
operates the data RAM in burst mode, and it’s the RAM itself that modifies the address for the second write.
The reason the value to write is transposed is that it has to be introduced via the ALU’s beta layer as a right operand. This isn’t much inconvenience, because most uses of WDM2
are for filling memory with 0, which is its own transpose.
The purpose of WDM2
is to speed memset
loops, particularly when an operating system needs to erase a 4096-word memory page for privacy before a user program is allowed to access it. This is especially helpful during electrical simulations of Dauug|36 running an operating system, because user page memset
s are among the most time-consuming tasks that an operating system needs to perform.
WPT
Write page table
Syntax |
wpt virt = phys |
Register | Signedness |
All | ignored |
1 opcode only |
No flags changed |
WPT
(write page table) maps a virtual memory block with base address virt
to a physical memory block with base address phys
. Because the block size is 4096 words, the 12 least significant bits of virt
and phys
are ignored and should in principle be zeros.
The user whose page table is altered depends on the eight user bits that are presented to the page table RAM when the instruction is executed, which are controlled by several Identity-modifying instructions. Proper use would generally be by the superuser in SETUP
mode like this:
setup user 105 wpt 0003_0000`o = 0100_0000`o priv
In this example, virtual page 3 is being mapped to physical page 64 (100`o
) for user 105. The SETUP
causes the user’s page table to be written instead of the superuser’s, but we have to drop out of setup mode using PRIV
right away, because SETUP
also switches the call stack to the user’s, and a CALL
, RETURN
, or REVERT
would taint it.
The number of bits in a virtual or physical address depends on the size of the page table, data 0, and data 1 SRAMs. The data RAM sizes may not match, and either data RAM may not even be installed. The operating system should probe for these three sizes by testing for address wraparound, and WPT
should conform to these sizes.
Note also that for physical addresses, bit 34 selects between up to two data RAM chips, and bit 35 enables write protection. These bits remain in their original positions for WPT
, so to do the above example with a write-protected page, the WPT
line would read:
wpt 0003_0000`o = 8000_0100_0000`o
It is critical that all locations of any user’s page table are initialized and point to physical memory that user is allowed to access, whether or not the virtual page is intended for use. This is to prevent the user from accessing memory that belongs to another. It is permissible to map all unused virtual pages for all users to a single physical page, provided it is write protected and contains no confidential information.