The Dauug House Wright State University logo
Dauug|36 minicomputer documentation

Unrestricted memory instructions (privileged)

Opcode P/U Category Description
ADDRDM priv unrestricted memory add then read data memory
RCM1 priv unrestricted memory read code memory part 1
RCM2 priv unrestricted memory read code memory part 2
RDM priv unrestricted memory read data memory
RPT priv unrestricted memory read page table
RWDM priv unrestricted memory read and write data memory
WCM priv unrestricted memory write code memory
WDM priv unrestricted memory write data memory
WDM2 priv unrestricted memory write data memory twice
WPT priv unrestricted memory write page table

Preface

As mentioned earlier, of the ten unrestricted memory instructions, five are simply privileged versions of nonprivileged instructions. That is, their privileged version (left column) bypasses the page table, but otherwise does what their nonprivileged counterpart (middle column) does.

Opcode Acts like Description
ADDRDM ADDLD add and read data memory*
RDM LD read data memory
RWDM LDSTO read and write data memory
WDM STO write data memory
WDM2 STO2 write data memory twice

*Minor differences exist due to page table side effects.

The five remaining memory instructions read and write the code memory and the page table. These have no nonprivileged counterparts, because user programs have no access to these memory types.

Opcode Description
RCM1 read code memory 1
RCM2 read code memory 2
RPT read page table
WCM write code memory
WPT write page table

(Due to timing constraints and high net contention, reading an instruction from code memory requires an uninterrupted two-instruction sequence.)

ADDRDM Add then read data memory

Syntax
dest = base addrdm offset
Register Signedness
All ignored
1 opcode only
Flag Set if and only if
N bit 35 of the result is set
Z all result bits are zero
T flag does not change
R flag does not change

ADDRDM (add then read data memory) approximates a base + offset scheme for reading from data memory, except the addition is somewhat broken. The advantage to using ADDRDM is that for cases where the addition quirk is known to be harmless, a base + offset read from memory can be done in a single instruction instead of two.

ADDRDM adds the tribbles of offset pairwise with the tribbles of base with wraparound, meaning there is no carry into bits 6, 12, 18, 24, or 30. Wraparound occurs because the alpha RAMs doing the addition act simultaneously and cannot intercommunicate. The six 6-bit sums form a physical address where a word is fetched from data memory. This result is written to register dest. The N and Z flags are set as if dest is a signed register. T and R do not change.

Unlike with ADDLD, the base and offset registers for ADDRDM are interchangeable, because the page table does not participate.

Safety for ADDRDM comes via the use of “ADDRDM-compatible” pointers, which are pointers to structures that either (i) do not cross 64-word boundaries, or (ii) are aligned on and fully within a power-of-two boundary. Unlike ADDLD, which is masked by the page table from bit 12 onward, any power-of-two segment size works with ADDRDM. The convenience of ADDRDM’s safety is that compatibility is established when memory for structures is allocated, rather than when ADDRDM is used. Thus all that is necessary to make a pointer ADDRDM-compatible is to use an ADDRDM-compatible allocator. It is sufficient but not necessary to use an ADDLD-compatible allocator, because ADDLD-compatible pointers are more restricted than ADDRDM-compatible pointers.

Because ADDRDM-compatible allocators fragment the free memory pool to satisfy alignment constraints, choosing ADDRDM can increase a program’s data memory consumption. For block sizes of 1 through 10 words, the overhead is less than 7% when all blocks are of the same size. Some block sizes, such as 33 words, will have overhead approaching 100%, although this overhead could be reclaimed in part by partitioning free blocks according to their size.

Why there is no ADDWDM instruction

Dauug|36 supports a maximum of two operands per instruction, but ADDWDM (add then write data memory) would require three operands—base, offset, and a word to write. This exceeds what the architecture can do in one instruction.

RCM1 Read code memory part 1

Syntax
rcm1 addr
Register Signedness
All ignored
1 opcode only
No flags changed

The two-instruction RCM sequence fetches a word from code memory. This is in a very congested area of the architecture’s cycle timing, and there is a great distance to travel through the pipeline. For this reason, code memory reads must be broken into a contiguous two-instruction sequence.

RCM1 reads from the code memory address stored in register addr, and leaves the value read in ff m, a 36-bit-wide flip-flop that is ordinarily used to move immediate values from code memory to registers. Any usage of ff m at this point will overwrite the value read from code memory, so it’s important that RCM2 be the next instruction executed. This can only be assured if the multitasking timer is disabled.

Conflicts

All instructions—except any instruction that immediately follows RCM1—are copied into ff m as they are fetched, in order that immediate values can be extracted. Due to timing constraints, this copy must occur prior to the instruction being decoded, so it is not known when ff m is loaded whether its contents will be required. There are two consequences to this.

First, if RCM1 and RCM2 are separated by even a single instruction (other than another RCM1 given the same address), the code memory will not be read successfully.

Second, any immediate instruction that immediately follows RCM1 will fail. Here is an example:

rcm1 addr
x = 0
rcm2 inst

After this code executes, inst is equal to 0, and x contains the lower 18 bits of the instruction at code address addr. The upper 18 bits of x are 0.

RCM2 Read code memory part 2

Syntax
rcm2 dest
Register Signedness
All ignored
1 opcode only
Flag Set if and only if
N bit 35 of the result is set
Z all result bits are zero
T flag does not change
R flag does not change

The two-instruction RCM sequence fetches a word from code memory. This is in a very congested area of the architecture’s cycle timing, and there is a great distance to travel through the pipeline. For this reason, code memory reads must be broken into a contiguous two-instruction sequence.

RCM2 copies the 36-bit word in ff m, a 36-bit-wide flip-flop that is ordinarily used to move immediate values from code memory to registers, to register dest. The N and Z flags are set as if dest is a signed register. T and R do not change.

When RCM2 immediately follows RCM1 with no possibility of interruption between, this two-instruction sequence reliably reads words from code memory. Here is how this could look:

unsigned addr inst

addr = 123456

; Disable the multitasking timer. This has further implications.
priv

; Now it's safe to read.
rcm1 addr
rcm2 inst
; 'inst' now has a copy of the instruction at code address 'addr'.

See also Cooperative multitasking or Preemptive multitasking for an actual application of RCM1, RCM2, and WCM.

RDM Read data memory

Syntax
dest = rdm addr
Register Signedness
All ignored
1 opcode only
Flag Set if and only if
N bit 35 of the result is set
Z all result bits are zero
T flag does not change
R flag does not change

RDM (read data memory) fetches a word from physical data memory address addr and stores the result in register dest. The N and Z flags are set as if dest is a signed register. T and R do not change.

RPT Read page table

Syntax
phys = rpt virt
Register Signedness
All ignored
1 opcode only
No flags changed

RPT (write page table) queries the base address of the physical memory block that is mapped to the virtual memory block base address virt. The result is written to register phys. Because the block size is 4096 words, the 12 least significant bits of virt are ignored and should in principle be zeros, and the 12 least significant bits of phys will definitely be zeros.

Note that in this architecture, every virtual address block electrically maps to a physical address block whether intentional or not. This is another way of saying that every LD or STO instruction will access physical memory somewhere, so the operating system needs to make sure that somewhere is permissible for the user in question.

The user whose page table is queried by RPT depends on the eight user bits that are presented to the page table RAM when the instruction is executed, which are controlled by several Identity-modifying instructions. Proper use would generally be by the superuser in SETUP mode like this:

setup
user 105
phys_block = rpt 0003_0000`o
priv

In this example, the physical address for user 105’s virtual page 3 is being looked up. The SETUP causes the user’s page table to be read instead of the superuser’s, but we have to drop out of setup mode using PRIV right away, because SETUP also switches the call stack to the user’s, and a CALL, RETURN, or REVERT would taint it.

The page table’s width—the number of bits in phys_block that actually mean something—can vary between machines; however, good electrical design requires that all 36 output bits actually be driven by something, even though page table RAM can’t offer enough outputs to fill everything. As of 19 June 2023, the 12 least significant bits come from the alpha RAMs and will be zero, and 18 bits come from the page table. This leaves 6 bits that come from Somewhere Else, perhaps a flip-flop somewhere, and their value has no relation to the page table and may vary from time to time or machine to machine. Note, too, that these six “martian” bits are not tribble-aligned, because bits 34 and 35 of all physical addresses have special meaning. The bottom line is, the operating system must probe for the page table dimensions when the system starts, and interpret information provided by RPT within the context of those dimensions. This can be as simple as having the operating system compute a mask to zero any uninvolved bits:

setup
user 105
phys_block = rpt 0003_0000`o
phys_block = phys_block & RPT_MASK_FROM_OS
priv

As of 19 June 2023, the netlist in current testing would result in `RPT_MASK_FROM_OS` being 6017_7777_0000`o. This is to say, bits 28–33 are uninvolved and would likely contain spurious data, and bits 0–11 represent offset bits that should read as zero even without masking.

RWDM Read and write data memory

Syntax
dest = addr rwdm tval
Register Signedness
All ignored
1 opcode only
Flag Set if and only if
N bit 35 of the result is set
Z all result bits are zero
T flag does not change
R flag does not change

RWDM (read and write data memory) atomically fetches the word from physical data memory address addr and stores the result in register dest, while storing the transposed value of register tval to the same physical address addr. The N and Z flags are set as if dest is a signed register. T and R do not change.

RWDM effectively rotates tval into memory with transposition, and what was in memory rotates to dest. It is permissible to use the same register for dest and tval, in which case RWDM becomes a register-memory swap with transposition. (Technically addr could also use the same register, although that scenario seems unlikely to me.)

The point of RWDM is not to save time, but to provide an atomic operation that can implement semaphores in shared memory. Transposition of the right operand is electrically unavoidable due to the instruction being limited to four CPU cycles, but it doesn’t matter much. In a simple semaphore, the right operand would be 0 or 1, which will not change value when transposed. In cases where transposition matters, TXOR can be added to write the intended value this way:

tval = 0 txor val
dest = addr rwdm tval

TXOR’s presence as a separate instruction does not break the atomicity of the RWDM.

Code to obtain a lock on a semaphore by atomically writing a 1 over a 0 at location addr would look like the following. There is no problem writing a 1 over someone else’s 1, but RWDM will let us know this happened so we don’t claim the semaphore.

waiting:
    tval = addr rwdm 1
    jump == ready           ; If tval == 0, we obtained the semaphore.
    yield
    nop                     ; YIELD does not take effect immediately.
    jump waiting            ; Needn't replace other user's 1 with our 1.

ready:
    ; Critical section goes here.

done:
    ; When it's time to release the semaphore, write 0 to addr.
    ; Since the semaphore is ours, we don't need to read what it was.
    sto addr = 0

Do not use RWDM on write-protected memory locations, because your program will leave the semaphore unlocked while proceeding as if the lock was acquired.

WCM Write code memory

Syntax
wcm addr = inst
Register Signedness
All ignored
1 opcode only
No flags changed

WCM (write code memory) writes the instruction in register inst into code memory at address addr. No flags are affected.

WCM and security

WCM is key gatekeeper to separation between programs, because there are only two paths to executing a privileged instruction. The first and principal path is for WCM to write a privileged instruction to code memory, where it can be fetched and executed. Any instruction WCM does not write into code memory will not be there to be fetched.

A secondary path to executing a privileged instruction is to fetch one from a register using the XANY instruction. This, however, can only occur if at some point WCM writes an XANY instruction to code memory to be executed. So in this case too, WCM is the gatekeeper for separation.

More about WCM and security

You may worry that once a privileged instruction is in code memory, any program can execute it. This isn’t the case, because instructions in code memory are dormant until they are fetched. For a user to fetch an instruction, the user’s instruction pointer must contain the address of the instruction, and this address can only come from JUMP and CALL instructions, which live in—you guessed it—code memory. So here too, WCM is used as a gatekeeper to keep code addresses that user programs shouldn’t have access too safely out of user programs.

(The above is almost the whole truth. The instruction pointer can also come from the instruction pointer incrementer, the JANY privileged instruction, and the firmware loader when the system first boots. Operating systems can easily account for these cases and preclude problems from occurring, here again exerting control by supervising what WCM writes into the code memory.)

WDM Write data memory

Syntax
wdm dest = val
Register Signedness
All ignored
1 opcode only
No flags changed

WDM (write data memory) copies the data in register val to data memory at the physical address contained in register dest. No flags are modified.

It is not an error if dest is a write-protected physical address; however, in this situation WDM will have no effect. (If you’re curious about write protection, every word of Dauug|36 physical data memory is accessible at two addresses that differ only at bit 35. The address with bit 35 set can be read from, but not written to. Privileged programs can easily overcome write protection by clearing bit 35, but user programs are stuck with how bit 35 is set in the page table.)

WDM2 Write data memory twice

Syntax
wdm2 dests = tval
Register Signedness
All ignored
1 opcode only
No flags changed

WDM2 (write data memory twice) transposes and stores the word in register tval to two memory locations determined by dests, based on the following table. No flags are changed.

dests mod 4 addresses written to
0 dests, dests + 1
1 dests, dests + 1
2 dests, dests + 1
3 dests, dests − 3

The reason this instruction cycles addresses modulo 4 is that WDM2 operates the data RAM in burst mode, and it’s the RAM itself that modifies the address for the second write.

The reason the value to write is transposed is that it has to be introduced via the ALU’s beta layer as a right operand. This isn’t much inconvenience, because most uses of WDM2 are for filling memory with 0, which is its own transpose.

The purpose of WDM2 is to speed memset loops, particularly when an operating system needs to erase a 4096-word memory page for privacy before a user program is allowed to access it. This is especially helpful during electrical simulations of Dauug|36 running an operating system, because user page memsets are among the most time-consuming tasks that an operating system needs to perform.

WPT Write page table

Syntax
wpt virt = phys
Register Signedness
All ignored
1 opcode only
No flags changed

WPT (write page table) maps a virtual memory block with base address virt to a physical memory block with base address phys. Because the block size is 4096 words, the 12 least significant bits of virt and phys are ignored and should in principle be zeros.

The user whose page table is altered depends on the eight user bits that are presented to the page table RAM when the instruction is executed, which are controlled by several Identity-modifying instructions. Proper use would generally be by the superuser in SETUP mode like this:

setup
user 105
wpt 0003_0000`o = 0100_0000`o
priv

In this example, virtual page 3 is being mapped to physical page 64 (100`o) for user 105. The SETUP causes the user’s page table to be written instead of the superuser’s, but we have to drop out of setup mode using PRIV right away, because SETUP also switches the call stack to the user’s, and a CALL, RETURN, or REVERT would taint it.

The number of bits in a virtual or physical address depends on the size of the page table, data 0, and data 1 SRAMs. The data RAM sizes may not match, and either data RAM may not even be installed. The operating system should probe for these three sizes by testing for address wraparound, and WPT should conform to these sizes.

Note also that for physical addresses, bit 34 selects between up to two data RAM chips, and bit 35 enables write protection. These bits remain in their original positions for WPT, so to do the above example with a write-protected page, the WPT line would read:

wpt 0003_0000`o = 8000_0100_0000`o

It is critical that all locations of any user’s page table are initialized and point to physical memory that user is allowed to access, whether or not the virtual page is intended for use. This is to prevent the user from accessing memory that belongs to another. It is permissible to map all unused virtual pages for all users to a single physical page, provided it is write protected and contains no confidential information.


Marc W. Abel
Computer Science and Engineering
College of Engineering and Computer Science
marc.abel@wright.edu
Without secure hardware, there is no secure software.
937-775-3016