# Bit-rearranging instructions

Opcode | P/U | Category | Description |

`PAIT` |
user | ALU: rearrange | permute across and inside tribbles |

`PAT` |
user | ALU: rearrange | permute across tribbles |

`PIT` |
user | ALU: rearrange | permute inside tribbles |

`ROL` |
user | ALU: shift-rotate | rotate left |

`ROR` |
user | ALU: shift-rotate | rotate right (proposed instruction) |

`SWIZ` |
user | ALU: rearrange | swizzle |

`TXOR` |
user | ALU: rearrange | transposing XOR |

The bit-rearranging instructions reposition bits without changing them. Except for `SWIZ`

(and `TXOR`

in the unusual case of a nonzero left operand), all 36 input bits appear exactly once in the output. `SWIZ`

, on the other hand, is able to replicate specific input bits multiple times, with the result that some input bits will not survive to the output.

Every bit-rearranging instruction uses exactly the same flag semantics, namely, `N`

and `Z`

are set as if the destination is a signed register. `T`

and `R`

do not change.

`PAIT`

Permute across and inside tribbles

Syntax |

`c = a pait b` |

Register | Signedness |

All | ignored |

1 opcode only |

Flag | Set if and only if |

`N` |
bit 35 of the result is set |

`Z` |
all result bits are zero |

`T` |
flag does not change |

`R` |
flag does not change |

`PAIT`

applies `PIT`

to the result of `PAT`

under the constraint that the right operands are identical for both permutations. This is helpful under the unusual circumstance where the ALU can roll these two operations into this single instruction. For instance:

c = a pait 131313131313‘o

mirrors the bits of a 36-bit word.

`PAT`

Permute across tribbles

Syntax |

`c = a pat b` |

Register | Signedness |

All | ignored |

1 opcode only |

Flag | Set if and only if |

`N` |
bit 35 of the result is set |

`Z` |
all result bits are zero |

`T` |
flag does not change |

`R` |
flag does not change |

The `PAT`

(permute across tribbles) instruction reorders (more specifically, transposes) the bits of `a`

according to the following relationship. If the input bit positions are numbered in base 36 as:

zyxwvu tsrqpo nmlkji hgfedc ba9876 543210

then the transposed bit order is:

ztnhb5 ysmga4 xrlf93 wqke82 vpjd71 uoic60

This self-inverse transposition actually happens twice with *every* Dauug|36 ALU instruction as words enter and leave the beta layer. The transposition is easier to see if the 36-bit word is drawn as a square matrix, where

z y x w v u t s r q p o n m l k j i h g f e d c b a 9 8 7 6 5 4 3 2 1 0

becomes

z t n h b 5 y s m g a 4 x r l f 9 3 w q k e 8 2 v p j d 7 1 u o i c 6 0

This is simply a reflection through the main diagonal. An alternative way of viewing this is to stay with the untransposed matrix

z y x w v u t s r q p o n m l k j i h g f e d c b a 9 8 7 6 5 4 3 2 1 0

and say that whereas `PIT`

rearranges each row, `PAT`

rearranges each column.

A fourth explanation of this transposition appears in the description of `TXOR`

at the bottom of this page.

In `PAT`

, the tribbles of `a`

(^{⊤}`a`

transposed) undergo permutation operations selected by the corresponding tribbles of `b`

. The permuted outcome is then transposed back and written to `c`

. Due to the 6-bit encoding limit of 64 operations, only 64 of the possible 6! = 720 permutations of six bits can be supported by this instruction. The 64 permutations I chose to support appear below, and all 720 permutations can be reached via 2-instruction compositions of these 64.

Oct. | Perm. | Derivation |

00 | 012345 | identity |

01 | 123450 | rotated identity |

02 | 234501 | rotated identity |

03 | 345012 | rotated identity |

04 | 450123 | rotated identity |

05 | 501234 | rotated identity |

06 | 432105 | reflected 05 |

07 | 321054 | reflected 04 |

10 | 210543 | reflected 03 |

11 | 105432 | reflected 02 |

12 | 054321 | reflected 01 |

13 | 543210 | reflected identity |

14 | 102345 | pair swap |

15 | 210345 | pair swap |

16 | 312045 | pair swap |

17 | 412305 | pair swap |

20 | 512340 | pair swap |

21 | 021345 | pair swap |

22 | 032145 | pair swap |

23 | 042315 | pair swap |

24 | 052341 | pair swap |

25 | 013245 | pair swap |

26 | 014325 | pair swap |

27 | 015342 | pair swap |

30 | 012435 | pair swap |

31 | 012543 | pair swap |

32 | 012354 | pair swap |

33 | 452301 | movement in pairs |

34 | 014523 | movement in pairs |

35 | 230145 | movement in pairs |

36 | 024135 | 2x3, 3x2 transposes |

37 | 031425 | 2x3, 3x2 transposes |

40 | 031524 | count out of circle by 3s |

41 | 142035 | count out of circle by 3s |

42 | 253140 | count out of circle by 3s |

43 | 304251 | count out of circle by 3s |

44 | 415302 | count out of circle by 3s |

45 | 520413 | count out of circle by 3s |

46 | 035124 | count out by 3s backward |

47 | 140253 | count out by 3s backward |

50 | 251304 | count out by 3s backward |

51 | 302415 | count out by 3s backward |

52 | 413520 | count out by 3s backward |

53 | 524031 | count out by 3s backward |

54 | 102354 | 2-pair rotation |

55 | 315042 | 2-pair rotation |

56 | 542310 | 2-pair rotation |

57 | 513240 | 2-pair rotation |

60 | 021435 | 3-pair rotation |

61 | 034125 | 3-pair rotation |

62 | 043215 | 3-pair rotation |

63 | 103254 | 3-pair rotation |

64 | 240513 | 3-pair rotation |

65 | 453201 | 3-pair rotation |

66 | 521430 | 3-pair rotation |

67 | 534120 | 3-pair rotation |

70 | 120534 | symmetric half rotation |

71 | 201453 | symmetric half rotation |

72 | 034512 | algorithmically selected |

73 | 035421 | algorithmically selected |

74 | 105243 | algorithmically selected |

75 | 130542 | algorithmically selected |

76 | 254310 | algorithmically selected |

77 | 510432 | algorithmically selected |

The above table notates tribble permutations as a 6-digit shorthand. Each digit shows the original position of each bit, numbered from the left starting with zero, in the permuted outcome. Thus if the permutation `234501`

is applied to the tribble `010011‘b`

, the result is `001101‘b`

.

Explained another way, `PAT`

does permutations between tribbles, where tribble `i`

of `b`

specifies a permutation among the `i`

th bits of each tribble of `a`

. `PIT`

and `PAT`

can be used in combination to produce any permutation of 36 bits in 5 instructions or fewer. An extended discussion of how this works appears on pages 116–125 in the dissertation.

As computing operands for `PAIT`

, `PAT`

, and `PIT`

, the eventual plan is to have the assembler provide this service automatically. This is not an immediate priority, since (i) it doesn’t affect hardware design or dissemination, and (ii) a requirement to arbitrarily permute words does not appear often in software for critical infrastructure. There is already an algorithm written in C that uses `PAT`

and `PIT`

(but does not try to optimize by using `PAIT`

); find it at `permutation_test_vector()`

in `perms.c`

in the Dauug|36 source tree.

`PIT`

Permute inside tribbles

Syntax |

`c = a pit b` |

Register | Signedness |

All | ignored |

1 opcode only |

Flag | Set if and only if |

`N` |
bit 35 of the result is set |

`Z` |
all result bits are zero |

`T` |
flag does not change |

`R` |
flag does not change |

Each tribble of `a`

undergoes a permutation operation selected by the corresponding tribble of `b`

, with the result written to `c`

. Due to the 6-bit encoding limit of 64 operations, only 64 of the possible 6! = 720 permutations of six bits can be supported by this instruction. These 64 permutations are listed in the above discussion of `PAT`

.

All 720 permutations of 6 bits can be reached via 2-instruction compositions of these 64, and all 371,993,326,789,901,217,467,999,448,150,835,200,000,000 (almost 372 American duodecillion) permutations of 36 bits are reachable via combinations of not more than five CPU instructions taken from `PAT`

and `PIT`

.

`ROL`

Rotate left

I believe that if `ROL`

is used with control words that have non-uniform subwords, it can implement certain permutations using fewer instructions than `PIT`

, `PAT`

, and `PAIT`

support. I have not investigated or enumerated these permutations.

`ROR`

Rotate right

As of 21 March 2024, a `ROR`

instruction is only being contemplated. If it is implemented, it will have equivalent permutation capabilities to `ROL`

(above) but use a different control word scheme.

`SWIZ`

Swizzle

Syntax |

`c = a swiz cw` |

Register | Signedness |

All | ignored |

1 opcode only |

Flag | Set if and only if |

`N` |
bit 35 of the result is set |

`Z` |
all result bits are zero |

`T` |
flag does not change |

`R` |
flag does not change |

`SWIZ`

forms tribbles of `c`

(^{⊤}`c`

transposed) using the corresponding tribbles of (untransposed) `cw`

as swizzle function identifiers operating on the corresponding tribbles of `a^t`

. `c`

is then untransposed and written to ^{⊤}`c`

. The 64 swizzle functions that can be selected via `cw`

are as follows:

Slot | Operation |

0 | copy from tribble 0 |

1 | copy from tribble 1 |

2 | copy from tribble 2 |

3 | copy from tribble 3 |

4 | copy from tribble 4 |

5 | copy from tribble 5 |

6–63 | reserved |

See `PAT`

for a synopsis of the transposing operation from `a`

to `a`

. ^{⊤}`N`

and `Z`

are set as if the destination is a signed register. `T`

and `R`

do not change, because no range checking is done. A routine use for SWIZ is to select a tribble from a register and replicate it to all of the other tribbles. For example, if `a`

= `013467_902356`o`

and `cw`

= `020202_020202`o`

, the result will be `909090_909090`o`

.

`TXOR`

Transposing XOR

Syntax |

`c = a txor b` |

Register | Signedness |

All | ignored |

1 opcode only |

Flag | Set if and only if |

`N` |
bit 35 of the result is set |

`Z` |
all result bits are zero |

`T` |
flag does not change |

`R` |
flag does not change |

The bits of `b`

are transposed to compute `b`

by relocating the ^{⊤}*i*th bit of tribble *j* to the *j*th bit of tribble *i* for all *i*, *j* ∈ {0...5}. This is the same transposition as described in `PAT`

. The bitwise exclusive-OR of `a`

with `b`

is then written to destination ^{⊤}`c`

. `N`

and `Z`

are set as if the destination is a signed register. `T`

and `R`

do not change.

`TXOR`

can be used to obtain the transpose of a register. To do this, use 0 for `a`

.