April 08, 2023

Creating a LaTeX clone from scratch

The core element of a text rendering engine is a datastructure which holds the boxes on the screen.

id name x y w h
0 pageborder 0 0 210 297
1 textborder 35 30 140 227
2 line 60 30 115 14
3 line 35 49 140 14
4 line 35 68 140 14

The entry id2 shows the first line of a paragraph with a small intendation. All the elements on a page are stored in this single boxtable. The units are not pixel position on the screen, but they are metric millimeters on the physical sheet of paper. For rederning the boxtable to the screen a second table is created which holds the pixel information according to the scaling factor.

What an algorithm like the Knuth plass linebreaking algorithm is doing is to convert a piece of text like "the quick brown fox jumps over the lazy dog" into a boxtable. So the program is not operating with the 2d rendered page on the screen but it is using the internal tabular representation of the boxes.