September 02, 2019

Forth is faster than C++


On the first look, C++ is perceived as the queen of all programming languages. A modern C++ compiler is able to generate the fastest code, and the current C++ standard is equal to a powerful tool for creating all sorts of software. The only disadvantage of C++ over other programming langauges like Java or Python is, that C++ is perceived as complicated to learn. But with a bit of training it's easy to become fluent in C++.
There is an alternative available to C++, which is located outside the C/C++ universe. This alternative is Forth and it's the most interesting programming language ever invented. The reason why Forth looks interesting is because it allows the programmer to make a time travel back to the 1980s in which the amount of RAM in a computer was below 500 kb and in which no operating systems at all were available.
The reason why Forth is more powerful over C++ is because it's a minimalistic programming language. The only problem with Forth is, that the amount of tutorials is low, and the existing one can't explain what Forth is. Another problem for the newbies is, that no Forth interpreter at all is available and the best practice method in overcome the situation is write it's own interpreter from scratch. Does it make sense to learn a programing which has no interpreter? Sure, because Forth allows the programmer to really understand what computing is about. The interesting feature of Forth is, that it's oldschool and looks modern at the same time.
Let us take a look how a modern programming language is working which was invented after the year 2000. In modern times, the compiler techniques are going into the direction of Just in time compilers and Virtual machines. This allows to separate between the runtime environment and the compiler. Forth is working with the same principle. At foremost, Forth is a virtual machine which runs a scripting language called Forth. Before the Forth code can be executed, the VM has to be created first. There are many ways in doing so.
In the jonesforth project, the Forth VM was created in assembly language. But there are many projects available in which the VM was programmed in C, Java and even in Python. Instead of asking what Forth is, the better idea is to ask what a Forth VM looks like. This explains very well why Forth is the future. Because a Forth VM is easier to realize than a VM for Python, Java or C#. The average Forth VM can be created in less than 50 kb, some Forth VM are available which are much smaller.
The average filesize for a small Forth VM is 2000 lines of code. The Perlforth implementation has exactly this size, but a Forth VM written in Python needs the same amount of space, https://github.com/whaleygeek/pyforth/blob/master/src/forth.py Somebody may ask what is the idea behind writting all the Forth Interpreters? The main idea is, that Forth allows a single user to master the computer. In contrast to a C++ compiler project which takes many million of lines, a Forth VM can be realized in a low amount of code.
Forth vs. C++
It make sense to compare Forth with the C++ language in direct comparison. From C++ it's known, that it can reach nearly the speed of Assembly language. A Forth VM can't compete with this speed. A Forth VM written in Python runs very slow. Because the VM itself is executed by the Python interpreter. and the program in the Forth VM has to be interpreted as well.
There are some Forth projects available which are trying to provide the maximum speed. They are able to compile Forth sourcecode into assembly statements, similar to what a C++ compiler would do. Such systems are running much faster, but there speed is lower than a C++ compiler. A highly optimized Forth compiler which produces a binary file can reach the same speed like a modern C++ compiler. That means, the binary file produced by the GCC compiler would need around 100 seconds to run, while the binary file from the Forth compiler would take around 130 seconds.
The problem is, that from a technical perspective, it's not possible to beat a C++ compiler. Because at the end, code is always translated into assembly code, and if the process was done perfectly, there is no room left for improvement. That means, in reality, Forth comes close to C++ but it's not faster than C++.
There is a small exeception. If not only the raw speed, but also weak properties like codesize and programming environment is important, Forth is the number one. That means, a program written in Forth needs less space in RAM, that the same program written in C++. Secondly, it's possible to shirink a Forth interpreter into 2000 lines of code, while the same is not possible for a C++ compiler. And last but not least, Forth allows the programmer to invent the wheel from scratch. That means, even if the Forth VM runs 20x slower than the C++ compiler, Forth can teach the programmer very much.
From today's perspective Forth is an early stage. Even it was invented in the 1970's the amount of books and projects is low. Until now, there is no operating system written in Forth, but a few attempts in doing so are available. From the technical perspective it's possible to write in Forth an operating system similar to the early CP/M operating system or the modern menuetOS. The surprising fact is, that the amount of manpower for doing so is small. An operating system is mostly a system library for getting access to the hardware devices and it provides some routines for GUI drawing.
In the future, Forth has the potential to overcome the limitation of C. The C ecosystem has the bottleneck like most programming languages to become bloatware. That means, it is very easy to create new C sourcecode for all sorts of tasks, and at the end, the operating system occupies 2 GB on the harddrive and nobody knows what to do with all the programs, libraries and compilers.
Writing a Forth interpreter
The best starting point in doing so is the Python language. The first version of a Forth VM is created as a learning project. In the second iteration the code gets optimized. In the third iteration it make sense to convert the Python code into a C++ project. And then the Forth interpreter is modified into a Forth compiler which can produce assembly language. If all the code is working great, the next step is to replace some parts of the C++ project with Forth code, and minimize the amount of non-Forth code. At the end, the remaining parts of the C++ projects are converted into assembly language. This would be equal to a near perfect Forth compiler.
Perhaps we should go a step backward and analyze what the overall picture is. The main idea is that the programmer needs a virtual machine to executing a programing language. The virtual machine takes the sourcecode and converts it into assembly code. This opens up many questions:
1. which kind of language should be interpreted by the virtual machine?
2. how to create the virtual machine internally?
For didicatical reasons it make sense to simplify things, this leads the project into the direction of a stack based language similar to Forth. The second question, how to program the virtual machine, depends on the knowledge of the programmer. He has to choose an architecture and a programming language of choice for realizing such a project. So we can say, it's not about Forth but about creating a virtual machine from scratch.
Some Forth implementations are available at github. Most of them are working well, but all of these yet another forth projects have the same problem. They are not able to build a larger community. That means, somebody has created the code ten years ago, and nobody cares. The reason why is because it's hard to combine with existing programming languages. Suppose, we have written a super-efficient Forth VM. It was written in C and can execute a Forth program quite well. But how exactly can this functionality be used in a real workflow? The problem is, that most of the programming world is working with mainstream languages like C, Java and Python. And the newly created Forth VM is not able to call a routine from a C library. This prevents, that the Forth subroutine can be used to extend an existing program written in C.
To overcome the issue a simple but effective technique is recommended, called RESTful. Restful is a networking interface to connect different programming languages. If the Forth VM contains of a Restful API, it's able to communicate with other programming languages. That means, the Forth program can execute code in a different program, and programs written in C or Java can execute code within the Forth VM.
Or let us switch the point of view. Suppose, there is a Forth VM which doesn't contains a restful API. How can the Forth code interact with existing projects? The answer is that there is no way. The Forth program and the program in Java are different worlds which can not speak to each other. A java program can't execute Forth code, while the Forth code can't call a Java subroutine.
In the Gforth systeme there is a feature available to call an existing c library, similar to what ctypes provides for Python programming. But this interface doesn't work very well and even in the case of success, the gforth program is only able to execute C code, but not a library written in C++ nor C#. The minimum standard for integrating Forth into an existing envirionment is if the Forth VM Comes with a RESTful interface.
In the easiest case this is equal to a Forth VM written in Python which has an interface written in FLASK. This allows other programs to send data to the Forth VM. This is the precondition for using the newly created Forth VM in a productive environment. That means, it's possible to write some routines in Forth and combine this with existing code written in a different programming language. From the technical side the goal is to create a Forth VM which includes a RESTful API.