Tables
The content management system needs convenient support for tables, due to the vast detail this documentation needs to present. There are also 59 more tables in the dissertation.
Basic markup for tables
The basic markup for a table looks like this:
t --------------------------- --------------- ----------- --------------- Category I Category II Category III Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no
The first line, called the column start line, defines the number of columns and the positions at which they capture text. The table must be indented at least three spaces, and the first three characters of the column start line must be lowercase t
followed by two spaces.
The output will look like:
Category I | Category II | Category III | |
Origin | purposeful | unexpected | malicious |
Example | arithmetic wrap | RowHammer | hidden backdoor |
Software workaround? | yes | no | no |
VLSI architect can fix? | yes | yes | no |
Supply chain owner can fix? | yes | yes | no |
Further tidying is not discussed in this section; it’s covered under advanced markup. (For now, you can insert ordinary boldface and italics markup. Just make sure that markup stays within the assigned columns.)
Benign variants of basic table markup
The markup may contain blank lines to improve legibility of the markup version. These blank lines will not affect the HTML output. So an equivalent variant on the above markup is:
t --------------------------- --------------- ----------- --------------- Category I Category II Category III Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no
The markup may also contain exact repetitions of the column start line except that the t
tag must be replaced by a space. This again is for legibility and style of the markup. The HTML output is not affected. So another equivalent variant is:
t --------------------------- --------------- ----------- --------------- Category I Category II Category III --------------------------- --------------- ----------- --------------- Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no --------------------------- --------------- ----------- ---------------
The column start line identifies the first character position of each column; the last position of a column is immediately prior to the first position of the next column. The rightmost column is not clamped to a last position, but is as wide as its widest text. Thus the above table could have been stated as follows with the same interpretation:
t ------ ---------- ------ ----------- Category I Category II Category III Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no
What the CMS tabulates in the column start line are those columns of the line that (i) are not blank, and (ii) are not the same symbol as their left neighbor (if one exists). So here are two more representations that have precisely the same result:
t AAAAAAAAAAAAAAAAAAAAAAAAAAAAA{{{{{{{{{{{{{{{{{5555555555555eeeeeeeeeeeeeee Category I Category II Category III Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no t . . . . Category I Category II Category III Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? yes yes no Supply chain owner can fix? yes yes no
This “basic markup” establishes that a table with five rows and four columns is present. It also establishes, in the manner you probably think, what text belongs to which cells. But this markup does not address matters of column alignment, possible headings, etc. Different markup does that.
Compressing table markup with delimiters
Sometimes we find ourselves in a corner. Maybe a column is too small for just one or two cells:
t ------ ---------- ------ ----------- Category I Category II Category III ------ ---------- ------ ----------- Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix? · yes · yes · no Supply chain owner can fix? · yes · yes · no ------ ---------- ------ -----------
Those ·
s are Unicode “middle dots,” 0xB7, and are used by default to override position-derived columns on a line-at-a-time basis. This is a delimited format, somewhat akin to CSV data for spreadsheets. In delimited table format nothing gets quoted, because a delimiter must be chosen that isn’t in any of the text. Leading and trailing whitespace is removed inside delimited fields. The delimiter can be changed via the ordinary digraph process. So the following example, which uses a digraph substitution and omits ignored spaces, is equivalent to all of the above examples:
t ------ ---------- ------ ----------- Category I Category II Category III ------ ---------- ------ ----------- Origin purposeful unexpected malicious Example arithmetic wrap RowHammer hidden backdoor Software workaround? yes no no VLSI architect can fix?XyesXyesXno Supply chain owner can fix?XyesXyesXno ------ ---------- ------ ----------- ·X
The above ugly mess is not an error. It displays as intended.
The middle dot ·
is a less fragile choice than X
, because text containing X
may be added to the table in the future.
The delimited table format can also be used when position-derived columns are too cumbersome, as may be the case for large and/or script-generated tables. If all of the table data are delimited, the column start line is still required in order, but it may be minimal as shown here:
t - - - - Category I·Category II·Category III Origin·purposeful·unexpected·malicious Example·arithmetic wrap·RowHammer·hidden backdoor Software workaround?·yes·no·no VLSI architect can fix?·yes·yes·no Supply chain owner can fix?·yes·yes·no
Table markup behaves gracefully if the number of columns present is not consistent with the column start line. If a line does not indicate enough columns, blank columns will be extrapolated at the end of the line to fill the table. If a line indicates too many columns, the extra columns will be catenated, with the delimiter still present, with the rightmost column.
Advanced markup for tables
Challenges
Support for making tables “look nice” is experimental and unsatisfying. The problem is with exporting information to HTML and CSS, because neither handles column formatting smoothly. HTML’s <colgroup>
and <col>
tags fail to transfer most attributes into actual table content, likely because (I am not an expert) they are not ancestors of table content, and correcting that would break a lot of semantics that should be invariant. CSS has an :nth-child
pseudo-selector that can pick up <td>
within <tr>
and thereby affect a column, but which column is affected has to be hard-coded within the CSS. The variations necessary to properly format a column are unpleasant.
Another concern with trying to work around the limitations of HTML and CSS is that the content management system’s markup design gets polluted by these limitations. So if someone comes along later who seeks to export tables in LaTeX or SVG, the implementation this person has to start from is not clean.
Approach
This CMS generates tables using only <table>
, <tr>
, and <td>
elements. This keeps the implementation simultaneously simple and flexible, subject to the limitations of HTML and CSS. At present, the markup’s only means of controlling table appearance is by labeling these four types of elements with class names. From there, style information in etc/dauug.css
takes over and completes the formatting. So in either order, the documentation writer has two tasks:
- Ensure that
etc/dauug.css
includes appropriate classes for table appearance. - Attach their names to
<table>
,<tr>
, and<td>
tags.
For a refresher on cascading style sheets, or CSS, try here.
Here are where various styles can be controlled:
<table>
can accept text-align
, color
, background-color
, etc., and these styles will propagate through <tr>
and <td>
unless overridden. What <table>
cannot do is control column widths.
<tr>
can accept text-align
, color
, background-color
, etc., and these styles will propagate through <td>
unless overridden. <tr>
can’t manage column widths.
The first row of <td>
can control column widths via the width
attribute. For this to work, the <table>
requires its table-layout
attribute to be fixed
. This approach is also rumored to speed loading of large tables, because less about the layout has to be calculated from the cells.
<td>
in general can accept text-align
, color
, background-color
, etc.
Markup language for attaching classes within tables
This CMS has a tiny language for selectively adding class names to table tags. This language has to appear prior to any table(s) it applies to, and remains in effect until changed. I’ll introduce this language by a couple of examples.
Suppose we want to produce this addition table, which has all columns the same width as well as boldface row and column headings:
+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
5 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
6 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
7 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
8 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
9 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
Some style support within dauug.css
is necessary. The following is sufficient:
table { table-layout: fixed; border-spacing: 10px 0; margin: 20px auto; } *.want-15px-wide { width: 15px; } *.want-right-just { text-align: right; } *.want-bold { font-weight: bold; }
Below is the article markup that produces the addition table. The uppercase T
block, which attaches classes to the table, must appear before the lowercase t
block, which contains the table data:
T a1 /table want-right-just /mark a1 /row want-bold /mark a1 /col want-bold /mark a1 z1 /bb want-15px-wide /mark t . .. .. .. .. .. .. .. .. .. .. + 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 10 2 2 3 4 5 6 7 8 9 10 11 3 3 4 5 6 7 8 9 10 11 12 4 4 5 6 7 8 9 10 11 12 13 5 5 6 7 8 9 10 11 12 13 14 6 6 7 8 9 10 11 12 13 14 15 7 7 8 9 10 11 12 13 14 15 16 8 8 9 10 11 12 13 14 15 16 17 9 9 10 11 12 13 14 15 16 17 18
The uppercase T
block contains four “sentences.” For ease of editing they are on four lines, but the line breaks are arbitrary and aren’t seen by the code that processes the script. Every useful sentence in this language begins has three mandatory steps that happen in this order:
- Cells are identified, either by their position or contents.
- CSS classes are named.
- A
/mark
command attaches classes to<table>
,<tr>
, or<td>
tags.
This example only identifies cell by position. This is done using the “A1” naming convention used by many spreadsheets: columns are identified by letter, starting with A. If there are more columns than 26, Z is followed by AA, AB, etc. Rows are identified by number, starting with 1. Leading zeros have no effect and won’t break anything, so A001
is the same as A1
. Column positions are not case sensitive, so a1
and A1
mean the same thing. In this addition table, cell A1
contains a +
sign.
The four sentences in the above T
block do the following:
a1 /table want-right-just /mark
a1
selects the top left cell.
/table
modifies this selection: it selects the entire table (that is, the <table>
tag) if any cells are selected. There is a good reason for tables to be selected in this manner, which we’ll come to later.
want-right-just
is a CSS class: we want the HTML tag to read
`<table class="want-right-just">`.
/mark
attaches any classes mentioned—in this case, just want-right-just
—to anything that was selected, which for this is the <table>
tag. You can verify right now that this occurred by looking at the HTML source for this page in your browser.
a1 /row want-bold /mark
a1
selects the top left cell.
/row
modifies the selection: the selected cells are replaced by the rows that contain them. In this example, only one cell a1
is selected, so after /row
only row 1
—the first <tr>
tag—is selected.
want-bold
is a CSS class to attach to the <tr>
. We could have added more classes here by separating them with spaces, such as:
a1 /row want-bold want-some-color /mark
/mark
adds the class to the tag as desired.
a1 /col want-bold /mark
a1
selects the top left cell.
/col
modifies the selection, but not in the way we wish. We’d like the selected cell (or cells) to be replaced by the columns that contain them, but HTML’s <colgroup>
and <col>
tags don’t work like that and are largely worthless. So what /col
actually does is replaces the selection with all of the cells that are in the columns the original selection is in. So rather than selecting one <colgroup>
tag, we brute-force it to work by selecting eleven <td>
tags. Unfortunately, this creates other subtle asymmetries between how /row
and /col
work. I won’t discuss them here—just remember if you’re debugging that /row
and /col
don’t have the same semantics in all corner cases.
want-bold
is our familiar CSS class that goes on all of those <td>
tags.
/mark
adds the class to the tag as desired.
a1 z1 /bb want-15px-wide /mark
We’ve saved the trickiest sentence for last.
a1 z1
selects two cells in the table. But which two? z1
isn’t actually in the table; the rightmost column is k
. Unlike spreadsheets that don’t have a preset size, this sentence will be run on a table with specific dimensions, and row and column specifications are clamped to fall within the table. Rather than figure out that the top right cell is k1
, I chose to exaggerate by naming z1
. The script reigned this in, and the two cells that are actually selected are a1
and k1
.
In addition to clamping rows and columns, the language presumes a default row of 1
and a default column of a
. You can’t use both defaults in the same cell to mean a1
, because that would be an empty string, which is ambiguous. But in this example, a z
is valid notation for a1 z1
, and this example could have been written that way.
/bb
stands for bounding box, and replaces the selected cells with the smallest axis-aligned filled rectangle of cells that contains the original selected cells. In this case, the rectangle is one row high and eleven columns wide. a1 z1 /bb
refers to all of the cells in the first row. This is to say, not one <tr>
tag, but eleven <td>
tags are being marked up. This is important, because we’re about to exploit one of CSS’s obscure corner cases that only works for <td>
, and only in the first row of a table.
want-15px-wide
is the class we’re selecting, and you can see in the CSS above that it specifies width: 15px
. When the first row <td>
tags with this class are considered in combination with table-layout: fixed
for the <table>
tag, the browser will display will cause every column in the table to have the width of its first <td>
, which in this table is uniformly 15 pixels.
/mark
adds the class to the tag as desired.
Selecting cells without /col
or /bb
Had you just wished to boldface a1
and k1
instead of the entire row, you would need to change:
a1 z1 /bb want-15px-wide /mark
to read:
a1 z1 /cell want-15px-wide /mark
This may seem counterintuitive, because after a1 z1
the cells are already selected. So why have the command /cell
to select them? It’s there to make a clear separation between cell selection and style names. Without /cell
, the script would have searched for cells that contain the text want-15px-wide
, which isn’t what we want. It also wouldn’t have found any styles to apply.
Selecting cells according to contents
Anything that is not in A1
spreadsheet form is interpreted to be a search string. The string can be any length greater than zero, and is case-sensitive. The principal uses of this feature are these:
- Style content within a table that is edited often.
- Eliminate need for counting rows and columns.
- Write style scripts that can work for a variety of tables.
The search process is a strict substring match; there are no extras like regex, case insensitivity, or ability to match whole words. Consider this new markup for our addition table. First some additional CSS:
*.want-orange { color: #b95f28; }
And a new style script in the article, immediately before the table:
T a1 /table want-right-just /mark a1 /row want-bold /mark a1 /col want-bold /mark a1 z1 /bb want-15px-wide /mark ·4 /cell want-orange want-bold /mark
The resulting table is:
+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
5 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
6 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
7 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
8 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
9 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
Recall that 4
is shorthand for cell A4
, which actually contains a 3
. We didn’t want that cell. We wanted cells that contain the text 4
. This is the reason the script has a middle dot (Unicode 0xB7) in the cell selection part of the script: ·4
is not in A1 format, so it was taken to represent search text. The rule is that right before search text is compared to cells, all middle dots are replaced with spaces, and then leading and trailing spaces are removed. This allows you to search for cell contents that contain, for example, two·words
. In our Orange 4 example, the middle dot is still there (·4
) when we decide it’s search text instead of a cell position, but it’s no longer there (4
) when cells are searched.
Because 14
contains 4
, we see that those cells are highlighted in orange. What if we wanted just the 4
s? There is no way to select for that, but what we can do is select for 14
and remove the style. Any style name that is prefixed with -
(hyphen) will be taken away instead of added. Here’s how:
T a1 /table want-right-just /mark a1 /row want-bold /mark a1 /col want-bold /mark a1 z1 /bb want-15px-wide /mark ·4 /cell want-orange want-bold /mark ·14 /cell -want-orange -want-bold /mark
The table is now displayed as:
+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
5 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
6 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
7 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
8 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
9 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
Searching table cells for middle dots
If you have middle dots in cells that you’d like to search for, you can use the digraph method to move its handling to some other character. Be sure not to choose another special character, or it won’t work. I suggest using @
or $
. Just add the digraph substitution line at the end like this:
T a1 /table want-right-just /mark a1 /row want-bold /mark a1 /col want-bold /mark a1 z1 /bb want-15px-wide /mark ·4 /cell want-orange want-bold /mark ·$
Of course, this won’t match anything in the table we’ve been looking at. There won’t be any orange, because there are no middle dots in the table.
Another search text example
We are about t
- Make a table all bold orange, except for one rectangle of cells.
- Use a middle dot to mark the rectangle we’re excluding.
- Hide the middle dot in the final table.
Here is the markup:
T a1 /table want-right-just /mark a1 /row want-bold /mark a1 /col want-bold /mark a1 z1 /bb want-15px-wide /mark a z99 /cell /bb want-orange want-bold /mark · /cell /bb -want-orange -want-bold /del /mark ·$ t 11---333---555---777---999---bbb + 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 10 2 2 3 4 5 6 7 8 9 10 11 3 3 4 5 6 7 8 9 10 11 12 4 4 5 6 ·7 8 9 10 11 12 13 5 5 6 7 8 9 10 11 12 13 14 6 6 7 8 9 10 11 12 13 14 15 7 7 8 9 10 11 12 13·14 15 16 8 8 9 10 11 12 13 14 15 16 17 9 9 10 11 12 13 14 15 16 17 18 ·$
Here is the result:
+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
5 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
6 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
7 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
8 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
9 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
This is tricky in several places. First, note that the middle dot · has different special meanings in T
and t
, and it has to be digraphed out of both. In T
, it’s ordinarily a space within a search string. In t
, it’s ordinarily used as a delimiter to override column separation. So we see ·$
on both blocks, where $
is safe because it doesn’t appear. (Those dots in the top line of the t
block are mere periods, although they look like middle dots a little.)
Second, every column in block t
is three characters wide. We know this from the column start line, which I’ve rewritten to (maybe) show the demarcation better. So the middle dot that’s left of the 7
is in column with the seven, but the middle dot right of the 13
is in the column with the 13.
Third, all cells (a z99 /cell /bb
) are styled bold orange. It won’t do the paint the whole table (a /table
) orange, because there won’t be any styles to strip from the <td>
elements. That would involve different CSS (with a color that isn’t orange and a weight that isn’t bold), which would be a different example.
Fourth, the cells in the marked box (· /cell /bb
) have their bold orange style revoked.
Fifth, we no longer want the middle dots in the two cells where they appear. The /del
before the final /mark
indicates that any matching text—in this example, just the two middle dots—is to be deleted throughout the table.
Table conclusion
This design for table support has some really rough edges, but hopefully its features are enough that Dauug|36 documentation doesn’t have to fall back on monospace, monocolor, monoweight tables. Each of these features is documented on this page to an extent.
The project of writing the CMS and writing the first twelve articles took an estimated ten working days, from 29 May 2023 to 9 June 2023. This included a good start on syntax highlighting, which is much easier than table support in the sense that syntax highlighting doesn’t build on unwieldy specifications designed by others. I believe and hope this effort was worth the time invested, relative to trying to adopt or continue use of some other content management system for this documentation. The test will be when a maintainer who isn’t me tries to add or revise material.