January 02, 2022

The mdtopdf tool

 There is an interesting upcoming github project available under the URL https://github.com/mandolyte/mdtopdf which has to do with converting a markdown file into the pdf format.

Somebody may argue that such a converter is already available and it is built in into the pandoc tool. But, pandoc is using the external latex program while the previously mentioned tool was written in go and doesn't need TeX at all. There is a reason why such tool wasn't available in the decades before, because the subject is difficult to understand.

On the first look a markdown to pdf converter has to do with converting format A into format B. But a closer look towards the problem will show that it more complicated. The markdown format is similar to HTML a markup language. It contains of text which is enriched with additional information. In contrast, a printer and the printer format PDF doesn't contain of text but it is action oriented. And this makes it harder to write a converter.

Basically spoken, the pdf format is not a text format similar to a .html but it is printer command language. It contains of actions like "drawtext()". So the converter has to read a plain text file as input and generates a script which is executed on the printer as output. So it is a text to program converter. And this makes it hard to write such a converter.

Let us give an example. In the markdown syntax a bold formatting is realized by "**hello**". This is a typical example for a markup syntax. The text author writes something and then the word should be printed. But, a printer doesn't work in this way. A printer doesn't need content but a command. What a printer can execute is a command like:

    setfont(bold)
    drawtext("hello")

That means, the original markdown file holds textual information, while the printer API is working with action commands. Or let me explain it the other way around. A markdown file can't be executed on a shell because it contains no commands, while a printer command language script doesn't holds information but it contains of statements.