Future instructions

The biggest news here is JREL (jump relative to page) and CREL (call relative to page), which will enable register-directed JUMPs and function pointer CALLs without risk of privilege escalation. Added 21 March 2024.

New instructions that only involve firmware changes

Most new instructions require only firmware, assembler, and documentation to implement. They don’t require any netlist (circuit board) changes, so there isn’t a lot of schedule pressure to finish them. Priority should be given to new instructions that call for netlist changes.

Division instructions

Specific instructions for division have not been enumerated, but ALU: divide describes their purpose and some of the capabilities that will be needed.

Signage-stratified immediate instructions

The Bugs page talks a little about problems with IMB (immediate both), IMH (immediate high), and IMN (immediate negative), which each need a signed and unsigned variant. Completing these will also enable the broken t_speed.a regression test to be fixed.

`SMO` (so many ones) stacked unary

For some reason, the stacked unary function to compute 2^L − 1, where L is the left operand, skipped implementation. It should be stratified by signage.

`ROR` (rotate right)

To date, control words for all shifts and rotations are expressed from a left rotation perspective. This saved a small amount of ALU memory at a time when much less was anticipated, but increases register pressure because the original control word to shift in one direction can’t be used to reverse the shift. Perhaps it’s time to add ROR (rotate right) and change the underlying semantics for the ASR and LSR opcodes, as well as the planned macros for preparing the control words.

This will be a breaking change for nearly all programs, so it should be given a relatively high priority.

Loop control instructions

Loops with known boundaries presently iterate by adding or subtracting one, checking the boundary (if it isn’t zero), and a conditional JUMP. Instructions can be added that combine the add/subtract with the check, essentially “increment with wrap until” and “decrement with wrap through” (e.g. half-open intervals). They will shorten the iteration step from three instructions to two.

Indexed bit operations

Firmware can be added to set, clear, or test a specific bit within a word in one instruction. The right operand will be the word, and the left operand will indicate which bit (0–35).

The beta layer will replicate the left operand to all tribbles, and the gamma layer will do the requested operation. This will increase gamma’s firmware size by three single-width slots—significant but doable.

`CMPB`, `CMPH`, `CMPL`, `CMPN` (compare both, high, low, negative)

Some kinds of programs have a great number of constants. As an example, consider a program that emulates the Dauug|36 CPU instruction set by having a “case statement” with dozens to hundreds of opcode constants. These programs can may run out of registers due to the large number of registers dedicated for constant storage. Is there a workaround? Additional firmware support can help, although I have not implemented it yet.

Like all ALU instructions, the CMP (compare) instructions are encoded with a 9-bit opcode, 9-bit destination register, 9-bit left operand, and 9-bit right operand. But CMP doesn’t write the result anywhere, so the 9-bit destination field isn’t used. A new family of compare-immediate instructions can be introduced that encode an immediate value in 18 bits (the destination and left operand fields), and compares that value with the register indicated by the right operand. The comparison would only be able to check for equality and inequality, because the constant wouldn’t be unpacked from ff m until the gamma layer in the ALU. So relations like > and <= could not be tested.

The syntax for these new instructions might look like

cmp 25 == x

in which case the Z(ero) flag is set if x is equal to 25. Otherwise, Z is cleared. The mandatory == reminds the programmer that only equality is being tested, and distinguishes the instruction from the mandatory - in the CMP instruction syntax.

The assembler could easily, as a convenience, also accept

cmp x == 25

and swap the operands so that the immediate value 25 is on the left. The ALU’s structure is not symmetric in its treatment of left and right operands, and due to the immediate value swizzling that is needed, the variable must be presented as the right operand.

`STUN` (stacked unary) standardization

Presently there are eleven STUN opcodes with varying control signals. Their names are STUN.A through STUN.K. Nothing has been committed to, and there isn’t even a list of which are in use by which stacked unary operations for which signals. Until commitments are made as to what is where, stacked unary code won’t be portable across firmware releases.

New instructions that require netlist changes

The instructions in this section are not able to run on the existing Dauug|36 netlist. They require hardware changes. This makes these instructions a higher priority to implement because we don’t want to build any physical machines they cannot run on. By implementing these instructions (with their hardware modifications) prior to physical prototyping and manufacturing, compatibility with all models will be assured.

`JREL` (jump relative to page)

I had the idea for JREL on 21 March 2024. It’s a nonprivileged version of JANY (jump anywhere) that leverages how the operating system splits code memory into pages to avoid fragmentation. According to Memory structures, the optimal code memory page size is 512 words. This turns out to be ideal for JREL, which is a registered jump relative to the code page in the instruction.

The Dauug|36 code address space is 27 bits. The 36-bit instruction word for JREL partitions into 9- and 18-bit fields as follows:

+++++++++ +++++++++ +++++++++ +++++++++
opcode    code      code      right
          page      page      operand

The upper 18 bits of the jump destination comes from the instruction word via ff j, and the lower 9 bits come from the right register file via ff a. Which register is determined by the right operand in the instruction word.

The magic of JREL that makes it nonprivileged is that assuming the OS allots code memory in 512-word pages, the lower 9 bits of the destination are equivalent in terms of which user the memory belongs to. Osmin only needs to check that the page branched to is within the current program’s code address space—a check that is already implemented in the OS and requires no changes to work with JREL.

The benefit to JREL is that it opens the architecture for the first time to nonprivileged register-based jumps. The architecture’s security claim of “no branch to addresses not hardcoded in CALL or JUMP” is slightly broadened to “no branch to code memory pages not hardcoded in CALL or JUMP.” JREL will be able to branch to any of 512 locations with O(1) time complexity. These locations will ordinarily contain JUMP instructions to fixed points.

JREL can be hardened against out-of-bounds registers either by ensuring that all 512 destinations JUMP to a controlled final destination, or by clamping the register in advance with MAX, MIN, and/or a bit mask operation.

The hardware change needed is that ff a and ff j will require partitioning. Unfortunately this partitioning is worst-case due to the 9-bit instruction field granularity colliding with dual 8-bit flip-flops in the SN74AUC16374 ICs. This will introduce another 48-pin component to the circuit board, increasing space needed by about 0.625%. Modifications to the instruction pointer lockout and its control decoder bit semantics are needed, although I think this can be done without a new control signal.

A benign drawback to JREL is that its allowance as a nonprivileged instruction will force the operating system to use a multiple of 512 words as the code memory page size. Any non-integer multiple of 512 words would enable JREL to branch to instructions not owned by the current program, opening a path to privilege escalation. Because I already determined there isn’t a performance benefit to having a non-allowable code page size, I plan to accept this restriction and add JREL to the architecture.

`CREL` (call relative to page)

CREL, the function call equivalent to JREL, opens the architecture for the first time to nonprivileged function calls through “pointers” (page offsets). The principal difference between JREL and CREL is that CREL will push a return address and flags on the stack. CREL piggybacks onto JREL’s circuit board change without further modifications.

Because the subroutine return address should immediately follow CREL, the called-to locations will not be CALLs, but JUMPs so as to not disturb the stack.

Future instructions

New instructions that only involve firmware changes

Division instructions

Signage-stratified immediate instructions

SMO (so many ones) stacked unary

ROR (rotate right)

Loop control instructions

Indexed bit operations

CMPB, CMPH, CMPL, CMPN (compare both, high, low, negative)

STUN (stacked unary) standardization