April 10, 2023

The Knuth plass algorithm

 Over the decades the LaTeX community has become an ideology. The idea is, that LaTeX is superior over MS Word and the user is advised only how to format documents within LaTeX. There are endless amount of definitions available what LaTeX is about. But a more technical explanation is missing.

The main difference between LaTeX and possible alternatives like Lowriter and MS Word is, that LaTeX has the builtin Knuth plass algorithm.[1][2] Implementing the algorithm from scratch is very difficult.[3] Most of the TeX related source code is devoted to this single problem.

Suppose, MS Word is adding a simple button in the settings menu which enables the algorithm for a Word document. The resulting justification and the distribution of content over multiple pages would work with the same principle used within LaTeX. Then, the visual layout would be improved and the difference between MS Word and LaTeX would be smaller. All the other features in LaTeX like separation between content and layout and a robust file format are available in MS Word too. For example, Word has a built in draft editor which allows to enter the text without any formatting and Word stores documents in an open XML format which is superior over the .tex format used in LaTeX.

The difference between MS Word and TeX can be reduced to the mentioned line breaking / page layout algorithm. The idea behind the algorithm is that the boxes on the screen are positioned more elegant. Elegant means, that the word space is equal over different lines and the pictures are located at the correct position.

The surprising situation is, that even within the LaTeX community the Knuth plass algorithm is ignored or it is discussed only seldom. The amount of papers about the subject is small. There are fewer than 100 papers published from the early 1980s about this algorithm. So it is some sort of expert knowledge not available for the masses. The interesting situation is, that apart from the algorithm, the TeX ecosystem has to offer nothing. Or at least, nothing which is more advanced than what is available in MS Word today. Word can export documents into a pdf file with a simple mouse click, it can format a document with the "Latin Modern Roman" font and its ability to insert mathematical equation is excellent. The only real weakness in Word is the calculation of white vertical and horizontal spaces which results into a low quality typesetting. Even untrained users will see on the first look if a two column text was formatted with MS Word or with LaTeX. That means, the Knuth-Plass algorithm is producing a visible difference.

The question is not how to indoctrinate happy MS Word users to switch to LaTeX but it is the other way around. The idea is to explain in simple words how the TeX internal line breaking algorithm works so that it can be integrated into mainstream applications like MS Word, fpdf2, Libreoffice and so on.

References
[1] Knuth, Donald E., and Michael F. Plass. "Breaking paragraphs into lines." Software: Practice and Experience 11.11 (1981): 1119-1184.
[2] Plass, Michael Frederick. Optimal pagination techniques for automatic typesetting systems. Stanford University, 1981.
[3] Verna, Didier E. "ETAP: Experimental Typesetting Algorithms Platform." ELS 2022: 15th European Lisp Symposium. 2022.

No comments:

Post a Comment