October 19, 2021

Reducing TeX to a minimum

 

Without any doubt, TeX is a powerful typesetting system which has revolutionized the computer industry. Unix combined with TeX has introduced digital typography and has reduced the costs for creating academic papers and books. The main problem with the TeX system is, that the ecosystem has become too complex.
It seems, that there is no higher authority who is able to remove some lines of code or certain programs from so called TeX distributions and as the result an endless amount of packages, binary files and different tex compilers were created in the past.
Some smaller attempts were made to replace latex which are loud and sile. Both a LateX like compilers but they do not provide the same functionality. Before it is possible to create a replacement for TeX there is need to analyze what the project is about.
The main idea behind TeX and the reason why the project is succesful is because it implements a list of layout commands. The current latex ecosystem has around 250 different commands which are extended with parameters and additional packages. Not all of the commands are important. So the next question is which of the commands are equal to the core of a typesetting system.
I have identified the most important single command which allows to create boxes. A general \box commands defines a frame on the output page. All the other commands in latex for typesetting tables, mathematical equations or paragraph have a lower priority.
Suppose a framed box is able to store different content like text, images or a table, then it is possible to arrange the boxes on a US-letter page and the resulting .pdf file will look like the output of latex. That means, in theory, the text or tables can be rendered outside of the layout formatter with external program. There is no need to combine everything in a single program.
A minimal TeX replacement provides a small amount of box creation commands and the ability to define columns and the pagesize. More is not needed to typeset a document. The estimation is that such a minimal program can be written in a low amount of codelines. What the software needs to do is to parse a text file, recognizes the \box command and render the pdf document as output.
The interesting point is, that this simple functionality allows to create longer books. Because what is in the boxes is fixed.
Example


From a technical perspective a latex compiler is a printer driver which is working with a pipeline That means, multiple steps are executed after each other and at the end the output is printed out. For creating the latex software itself the transition from an input textfile to the rendered pdf document is important.
For reason of simplification the idea is not to create pdf files which are usually done with additional libraries but the postscript format can be created much easier. The task for a latex typesetting software is to convert a “.tex” file into a “.ps” file. An example postscript file is given next. It contains of boxes which are filled with content.
 
%!PS-Adobe-3.0
/Helvetica findfont
12 scalefont setfont

/text1 {
40 650 moveto 
(Hello World!) show
} def

/text2 {
350 400 moveto 
(Text is here)
show
} def

/box1 {
40 600 moveto 
0 0 40 20 rectstroke
show
} def

/box1 {
2 setlinewidth
40 650 500 30 rectstroke
} def

/columns {
0.5 setlinewidth
40 150 200 400 rectstroke
350 150 200 400 rectstroke
} def


%------main---------
text1
text2
box1
columns

showpage
The problem with the postscript format is, that creating such files manually is a time consuming task. The reason is that all the coordinates are absolute values. That means, somebody has to enter that the second column has it's bottom left position at (350 150). The logical next step is to write a software which determines the values automatically. This task goes beyond of this small introduction.