April 05, 2023

Writing a latex clone from scratch

 Of course the idea sounds like a failed project because everybody knows, that latex contains of millions of millions of codelines. On the other hand it would be interesting to write a prototype which reduces a typesetting system to its minimum.

First thing to know is, that an elaborated markup language exists already which is markdown. Markdown is an enhanced plain text format which allows the user to define sections, bullet points and tables. The language is more than capabie as an input format for a typesetting system.

The open question is how exactly a markdown file gets rendered into a picture? The creation of a .PNG file itself is a trivial task, many python libraies are available for this purpose. and converting a picture into pdf is also easy going. The more serious problem is how to position characters, lines and paragraphs at the picture.

A rough estimation comes to the conclusion that typesetting is mostly about a list of features which are stored in a long table. Features can be: margin left, margin bottom, font size for text, font size for sections, linespacing, distance between pictures and so on. In addition the table needs to store dynamic data like "word space in line1", "word space in line" and so on.

The working thesis is, that the creation of the png image is realized by sending queries to the datatable and storing information into the table.

Let me give an example Suppose the idea is to draw only the first page of a book and the page contains of a black rectangle which is filled. For doing so, the drawring routine needs some information from the layout engine:
- margin of the page
- position of the rectangle
- color of the rectangle
and so on.

The idea is, that any drawing process is working with the same principle. That means the datatable is the core element of a layout engine. Technically such a table can be realized as a hierarchical python struct, but it remains unclear how to do so in detail.