October 25, 2023

Writing a compiler for Forth

 There is a reason available why non x86 cpu architecture are ignored by mainstream computing. Because these hardware isn't compiler friendly. For explaining this term we have to sort possible CPU architecture by their complexity.

The most easy to build hardware is a stackbased Forth CPU. Such a machine can be realized with a low transistor count. A Forth cpu supports a limited amount of assembly instructions. It has no registers but only a single datastructure which is the stack. The next logical step in processor design is the RISC architecture. RISC stands in the middle between CISC and stack machines. Typical examples for RISC machines are the MIPS cpu which has some registers but very few. On the other end of the scale there are full blown AMD64 compatible x86 processor like the famous Intel Core i series which is used in mainstream computing and it is powering more than 1 billion desktop PC worldwide.

The acceptance of RISC CPU is low, while the market share of Forth CPU is nearly zero. Both processors are difficult to program or to explain it more technically, it is difficult to write a C compiler for these systems. There are some C compilers available for MIPS processors but they are complicated because a lot of optimization is needed. in contrast writing a c compiler for AMD64 is much easier, because the underlying hardware provides more high level assembly instructions.

The best way to program Forth CPU and also MIPS processors is by typing in the assembly instruction manually. This is equal to avoid any compiler inbetween and the user has to think like the mentioned cpu. Its obvious that most programmers are not interested in such an interaction because it takes endless amount of time to program complex software direct in assembly. This situation prevents an upraising of RISC and stackbased forth CPUs.

What mainstream programmers are doing all the time is to formulate software in C. C is the only important language in modern software development. Nearly all the operating systems like Linux, windows, MacOS and even Haiku are written in C/C++. The main advantage over Assembly instructions is, that C code can be written much faster, this allows to create full blown GUI systems including libraries. The result is, that low efficiency CPU design like the x64 processor is prefered over advanced chip design like RISC and stack machines.

A possible attempt to make non x86 processors more popular would be the existence of advanced compilers. From a technical perspective every CPU is turing capable, that means the same algorithm written for CISC cpu can also be executed on a stackmachine. The only bottleneck is, that somebody has to create the code first. In modern computing the automatical compilation process will generate the code. So there is a need to create / program modern C compilers which are able to create code for targets like Forth CPUs and for MIPS cpu.

From a technical perspective, a compiler is a translator. it takes a C program as input and generates Assembly instruction as output. In case of a stackmachine the needed assembly instructions have a certain format which is known as Forth, or as Forth code. A forth like stackmachine is minimalist computer which is of course controlled by a program. This program needs to be written before useful behavior can be generated.

To understand the pros and cons of stack based machines better it makes sense to take a closer look into MIPS assembly. Risc based MIPS cpus have the role of an inbetween. They are not as minimalist as Forth but they are less complex than x86. MIPS cpu have an integrated stack which allows to push and pop values. So there is a similarity to Forth. in contrast to Forth, mips provides further storage capacity and more complex commands. Mips can be programmed in Assembly language and in high level C as well.  of course the assembly language is more efficient and especially for embedded systems it is the prefered choice of programmers. On the other hand, C has a faster development cycle so there is need to use this high level language as well.

What we can say for sure is, that mips assembly and Forth assembly are both examples for low level language. Even Forth advocates are claiming that Forth is also a high level language, the claim can be rejected. Because Forth is different from C. C is a high level language because it allows to formulate algorithms in a non CPU perspective. A  C programmer doesn't need to know how many registers the underlying hardware has or what a push to the stack is about. A c programmers writes the code only in C and then the compiler generates the machine instructions.