Bit-rearranging instructions
Opcode | P/U | Category | Description |
PAIT |
user | ALU: rearrange | permute across and inside tribbles |
PAT |
user | ALU: rearrange | permute across tribbles |
PIT |
user | ALU: rearrange | permute inside tribbles |
ROL |
user | ALU: shift-rotate | rotate left |
ROR |
user | ALU: shift-rotate | rotate right (proposed instruction) |
SWIZ |
user | ALU: rearrange | swizzle |
TXOR |
user | ALU: rearrange | transposing XOR |
The bit-rearranging instructions reposition bits without changing them. Except for SWIZ
(and TXOR
in the unusual case of a nonzero left operand), all 36 input bits appear exactly once in the output. SWIZ
, on the other hand, is able to replicate specific input bits multiple times, with the result that some input bits will not survive to the output.
Every bit-rearranging instruction uses exactly the same flag semantics, namely, N
and Z
are set as if the destination is a signed register. T
and R
do not change.
PAIT
Permute across and inside tribbles
Syntax |
c = a pait b |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
PAIT
applies PIT
to the result of PAT
under the constraint that the right operands are identical for both permutations. This is helpful under the unusual circumstance where the ALU can roll these two operations into this single instruction. For instance:
c = a pait 131313131313‘o
mirrors the bits of a 36-bit word.
PAT
Permute across tribbles
Syntax |
c = a pat b |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
The PAT
(permute across tribbles) instruction reorders (more specifically, transposes) the bits of a
according to the following relationship. If the input bit positions are numbered in base 36 as:
zyxwvu tsrqpo nmlkji hgfedc ba9876 543210
then the transposed bit order is:
ztnhb5 ysmga4 xrlf93 wqke82 vpjd71 uoic60
This self-inverse transposition actually happens twice with every Dauug|36 ALU instruction as words enter and leave the beta layer. The transposition is easier to see if the 36-bit word is drawn as a square matrix, where
z y x w v u t s r q p o n m l k j i h g f e d c b a 9 8 7 6 5 4 3 2 1 0
becomes
z t n h b 5 y s m g a 4 x r l f 9 3 w q k e 8 2 v p j d 7 1 u o i c 6 0
This is simply a reflection through the main diagonal. An alternative way of viewing this is to stay with the untransposed matrix
z y x w v u t s r q p o n m l k j i h g f e d c b a 9 8 7 6 5 4 3 2 1 0
and say that whereas PIT
rearranges each row, PAT
rearranges each column.
A fourth explanation of this transposition appears in the description of TXOR
at the bottom of this page.
In PAT
, the tribbles of a⊤
(a
transposed) undergo permutation operations selected by the corresponding tribbles of b
. The permuted outcome is then transposed back and written to c
. Due to the 6-bit encoding limit of 64 operations, only 64 of the possible 6! = 720 permutations of six bits can be supported by this instruction. The 64 permutations I chose to support appear below, and all 720 permutations can be reached via 2-instruction compositions of these 64.
Oct. | Perm. | Derivation |
00 | 012345 | identity |
01 | 123450 | rotated identity |
02 | 234501 | rotated identity |
03 | 345012 | rotated identity |
04 | 450123 | rotated identity |
05 | 501234 | rotated identity |
06 | 432105 | reflected 05 |
07 | 321054 | reflected 04 |
10 | 210543 | reflected 03 |
11 | 105432 | reflected 02 |
12 | 054321 | reflected 01 |
13 | 543210 | reflected identity |
14 | 102345 | pair swap |
15 | 210345 | pair swap |
16 | 312045 | pair swap |
17 | 412305 | pair swap |
20 | 512340 | pair swap |
21 | 021345 | pair swap |
22 | 032145 | pair swap |
23 | 042315 | pair swap |
24 | 052341 | pair swap |
25 | 013245 | pair swap |
26 | 014325 | pair swap |
27 | 015342 | pair swap |
30 | 012435 | pair swap |
31 | 012543 | pair swap |
32 | 012354 | pair swap |
33 | 452301 | movement in pairs |
34 | 014523 | movement in pairs |
35 | 230145 | movement in pairs |
36 | 024135 | 2x3, 3x2 transposes |
37 | 031425 | 2x3, 3x2 transposes |
40 | 031524 | count out of circle by 3s |
41 | 142035 | count out of circle by 3s |
42 | 253140 | count out of circle by 3s |
43 | 304251 | count out of circle by 3s |
44 | 415302 | count out of circle by 3s |
45 | 520413 | count out of circle by 3s |
46 | 035124 | count out by 3s backward |
47 | 140253 | count out by 3s backward |
50 | 251304 | count out by 3s backward |
51 | 302415 | count out by 3s backward |
52 | 413520 | count out by 3s backward |
53 | 524031 | count out by 3s backward |
54 | 102354 | 2-pair rotation |
55 | 315042 | 2-pair rotation |
56 | 542310 | 2-pair rotation |
57 | 513240 | 2-pair rotation |
60 | 021435 | 3-pair rotation |
61 | 034125 | 3-pair rotation |
62 | 043215 | 3-pair rotation |
63 | 103254 | 3-pair rotation |
64 | 240513 | 3-pair rotation |
65 | 453201 | 3-pair rotation |
66 | 521430 | 3-pair rotation |
67 | 534120 | 3-pair rotation |
70 | 120534 | symmetric half rotation |
71 | 201453 | symmetric half rotation |
72 | 034512 | algorithmically selected |
73 | 035421 | algorithmically selected |
74 | 105243 | algorithmically selected |
75 | 130542 | algorithmically selected |
76 | 254310 | algorithmically selected |
77 | 510432 | algorithmically selected |
The above table notates tribble permutations as a 6-digit shorthand. Each digit shows the original position of each bit, numbered from the left starting with zero, in the permuted outcome. Thus if the permutation 234501
is applied to the tribble 010011‘b
, the result is 001101‘b
.
Explained another way, PAT
does permutations between tribbles, where tribble i
of b
specifies a permutation among the i
th bits of each tribble of a
. PIT
and PAT
can be used in combination to produce any permutation of 36 bits in 5 instructions or fewer. An extended discussion of how this works appears on pages 116–125 in the dissertation.
As computing operands for PAIT
, PAT
, and PIT
, the eventual plan is to have the assembler provide this service automatically. This is not an immediate priority, since (i) it doesn’t affect hardware design or dissemination, and (ii) a requirement to arbitrarily permute words does not appear often in software for critical infrastructure. There is already an algorithm written in C that uses PAT
and PIT
(but does not try to optimize by using PAIT
); find it at permutation_test_vector()
in perms.c
in the Dauug|36 source tree.
PIT
Permute inside tribbles
Syntax |
c = a pit b |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
Each tribble of a
undergoes a permutation operation selected by the corresponding tribble of b
, with the result written to c
. Due to the 6-bit encoding limit of 64 operations, only 64 of the possible 6! = 720 permutations of six bits can be supported by this instruction. These 64 permutations are listed in the above discussion of PAT
.
All 720 permutations of 6 bits can be reached via 2-instruction compositions of these 64, and all 371,993,326,789,901,217,467,999,448,150,835,200,000,000 (almost 372 American duodecillion) permutations of 36 bits are reachable via combinations of not more than five CPU instructions taken from PAT
and PIT
.
ROL
Rotate left
I believe that if ROL
is used with control words that have non-uniform subwords, it can implement certain permutations using fewer instructions than PIT
, PAT
, and PAIT
support. I have not investigated or enumerated these permutations.
ROR
Rotate right
As of 21 March 2024, a ROR
instruction is only being contemplated. If it is implemented, it will have equivalent permutation capabilities to ROL
(above) but use a different control word scheme.
SWIZ
Swizzle
Syntax |
c = a swiz cw |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
SWIZ
forms tribbles of c⊤
(c
transposed) using the corresponding tribbles of (untransposed) cw
as swizzle function identifiers operating on the corresponding tribbles of a^t
. c⊤
is then untransposed and written to c
. The 64 swizzle functions that can be selected via cw
are as follows:
Slot | Operation |
0 | copy from tribble 0 |
1 | copy from tribble 1 |
2 | copy from tribble 2 |
3 | copy from tribble 3 |
4 | copy from tribble 4 |
5 | copy from tribble 5 |
6–63 | reserved |
See PAT
for a synopsis of the transposing operation from a
to a⊤
. N
and Z
are set as if the destination is a signed register. T
and R
do not change, because no range checking is done. A routine use for SWIZ is to select a tribble from a register and replicate it to all of the other tribbles. For example, if a
= 013467_902356`o
and cw
= 020202_020202`o
, the result will be 909090_909090`o
.
TXOR
Transposing XOR
Syntax |
c = a txor b |
Register | Signedness |
All | ignored |
1 opcode only |
Flag | Set if and only if |
N |
bit 35 of the result is set |
Z |
all result bits are zero |
T |
flag does not change |
R |
flag does not change |
The bits of b
are transposed to compute b⊤
by relocating the ith bit of tribble j to the jth bit of tribble i for all i, j ∈ {0...5}. This is the same transposition as described in PAT
. The bitwise exclusive-OR of a
with b⊤
is then written to destination c
. N
and Z
are set as if the destination is a signed register. T
and R
do not change.
TXOR
can be used to obtain the transpose of a register. To do this, use 0 for a
.