January 02, 2022

Typesetting with a command language

 There are endless amount of software available which are promising to create high quality pdf documents. Some of them can be used in reality while other have become obsolete. The main problem is not how to program these tools but the bottleneck is to understand what word processing is in general.

A common assumption is that there are two different approaches available namely WYSIWYG programs and markup oriented tools. Software programs for each categories are widely known so there is no need to reference them again. The reason is that typesetting is used by a large amount of users and the PC is well suited for such a purpose.

First thing to do is to understand what the average user is doing with a word processor. In most cases the starting position is a zip file which contains of text files and image files. And the goal is to convert this zip file into a pdf document. The steps in between are realized with the typesetting program.

To understand what the task in between is, we have to focus on the technical needs. The step in between is realized with a printer command language. A printer command language is a programming language used to send commands to a plotting device. And this description is maybe a bit unusual. Because the starting text file is not a program and what the user likes to do is print out this text file. But a printer can't handle plain text files, what a printer is a command file.

In the past, lots of command languages were created. The concrete specification is not important at all. It is even possible to invent a command language for a display. Suppose there is a pixel map on the screen which has 700x300 pixels. A printer command language is used to draw on this screen. It is relatively easy to imagine what the commands are. In the easiest case the printer command language contains of commands like drawtext(x,y,text) and drawimage(x,y,image).

The output device is not a file, similar to a text file on the computer, but the output device is a computer, which accepts commands but not content. Word processing means to convert a zip file which contains of content into a command language which contains of actions. Let me give an example:

  # command language example
  title="hello world"
  drawtext(0,0,title)

  abstract="Lorem ipsum"
  drawtext(0,20,abstract)

  image=screenshot.png
  drawimage(0,50,image)

  paragraph="lorem ipsum"
  drawtext(0,0,paragraph)


This mini program has much in common with a python script but it is printing something to a screen. The interesting situation is such a file doesn't contain of normal text and it is not a markup language, but it is sequence of action words, formulated in a printer command language. Such a script can be executed on a printer or a graphic display. So the question is how to create such a command language script? One option is enter to commands manually, or to use a generator. This would result into a multi step process.

1. input: textfile, images
2. printer command language
3. output: pdf file

 

Generating a pdf paper with a command language

The interesting situation, that even the pdf format is known since years and around 100 different programs are available to create such a file format, it remains unclear what exactly word processing is about. The reason is, that the subject is mostly defined from a users perspective which makes it hard to grasp the underlying technical pipeline.

From a users perspective there are two sorts of programs available: WYSWYG programs and markup languages. Instead of arguing which of these tools are working better the idea is to describe in general what typesetting and printing is about. Typesetting means to convert content into commands. An example for content is a text file or a png image. And an example for a printer command is "drawtext()".

So what is the difference? It has to do with different understanding of the same subject. An author of a text things in categories of content. He writes a text or he draws an image. All these content is stored in files. A text file contains of information but it can't be executed. In contrast, an output device like a printer or a graphic display have a different perspective to the world. They are operating with commands. That means, a printer provides to the outside world a API (application programing interface). Printing something out means to translate between content into commands. This is inbetween step is realized by word processor.

After this introduction let us take a low how to print out something from a low level perspective. The idea is, that not the content stands in the focus but the needs of the output device. In the concrete case the output device is a pdf file which is created by the FPDF libray in the python language.

    import fpdf,os

    width=190
    pdf = fpdf.FPDF()
    pdf.add_page()

    title="lorem ipsum"
    pdf.set_font('Arial', '', 32)
    pdf.multi_cell(width,12,title,1,"C")
    pdf.ln()

    text="""Longer text
    newline
    lorem ipsum."""
    pdf.set_font('Arial', '', 12)
    pdf.multi_cell(width,6,text,1,"J")
    pdf.ln()

    pdf.output('1.pdf', 'F')
    os.system("evince 1.pdf")


After running this Python script it will create a pdf file and show it on the screen. The interesting situation is, that the python script contains of commands. That means, the printer doesn't understand a text file but it needs a commands to print a certain line. Suppose the idea is to print out a longer document which has 20 pages. This is realized by providing lots of commands after each other. At the beginning there is a command for printing the headline, then a command for the abstract, then a command for a picture and so on.

The problem is, that most authors are not trained to formulate a text in this way. The author doesn't like to program a printer, but he likes to print a text file. So the question is how to convert a ".txt" file into a ".py" script which contains of printer commands? This is indeed an interesting problem and it is handled by word processing software. A word processor takes content from the user and converts it into a command language.

In theory it is possible to convert the markdown format into a printer command language. Such a tool would work the following way. The text author writes something into th markdown format which is a plain text format. Then the converter is started. The converter takes the markdown content and generates a python script formulated in the printer command language. The python script contains the syntax of the fpdf library which is mostly the command multi_cell() and set_font(). These commands are send to the virtual printer and are generating the pdf file.

The challenge is to convert the markdown content into an action language. For example in the markdown syntax a new section is started with "## section". But this markup syntax contains only of content but it is not a command. A command which can be interpreted from a printer would look like:
   pdf.set_font('Arial', '', 12)

There is a typical communiation problem. The author of a text things in categories of content, while the output printer needs dedicated commands to operate. The interesting situation is, that with the correct commands the printer device can do anything. That means, it will output a single character or a longer text line. The only problem is how to convert content into action.