Robotics and Artificial Intelligence: Teaching stackmachines with a slow Forth language simulator

Forth is not known in mainstream computing, because the reverse polish notation is difficult to learn. To overcome the problem, a Forth language simulator is described which makes it easier for newbies to play around with a multistack-machine. The idea is to reduce the performance down to 1 instruction per second so that the user can observe in a singlestep mode what his program is doing.(video sourcecode) The question which remains open is how to program in Forth itself. The development of stack based algorithm is outside the scope of this paper, here is only a simulator given, which is able to execute existing Forth code.

The following longer article was published first on a different website as a PDF document. Here is the HTML version of the same text.

Title: Teaching stackmachines with a slow Forth language simulator

Author: Manuel Rodriguez

Year: March 30, 2018

1 History
- 1.1 Short history of stackmachines
2 Misc
3 Performance
4 Hardware
5 Examples
- 5.1 Implementation
- 5.2 Testing out SP-Forth
6 Other languages
References

1 History

1.1 Short history of stackmachines

Charles L. Hamblin published in 1957 a paper about stackmachines which was later realized in the KDF.9 computer. It was the first stackmachine who was ever build. In the late 1960s Chuck Moore invented the Forth programming language. Forth was first used as a programming language for existing computers and in the 1980s dedicated Forth chips in hardware were build, for example the RTX2010. The 1990s and later was dominated by private research. :The first Forth user groups were founded and the current development is, that amateurs are using FPGA hardware for implementing a stackmachine.

From a non-Forth perspective the question is, what is so special about a stackmachine? Can this technology be used to make computers more efficient? Stackmachines are usually engineered not from a user perspective but from a hardware point of view. The architecture allows to minimize the transistor count and make the design very clean. The negative aspect of Forth is, that it is not possible to run normal highlevel languages like Pascal, C or C++ on this. So stackmachines are useless for mainstream computing.

If we are observing the Forth literature, many attempts were undertaken in the past to port C compilers to stackmachines. In most cases not very successful. The problem is described in the literature as compiler-friendly CPU. Or better, only a CISC CPU is compiler friendly. That means, mainstream computing hardware is designed for the needs of the programmers and endusers, not for the needs of the machine itself. Another disadvantage of Forth CPUs is, that even somebody is able to convert existing c application to Forth, the algorithm is not very efficient. The reason is, that algorithm like sorting has to be implemented for a stackmachine differently. Usually algorithm are in contrast written for CISC CPUs, they are using variables and not a stack.

Early compilers

Let us go back in the year 1960 in which stackmachines were first researched by pioneers. The idea was, that on one hand there is a lowlevel machine language and on the other hand the need for programming in a high-level language. Today we call the gap between them a compiler, in the 1960s the process was called “automatic programming”. The idea is not to program in Assembly language but in a high-level-language.

Everybody who is familiar with the reverse polish notation on the HP calculator knows, that it is difficult to use. So the reverse polish notation was replaced by something which works easier. Todays calculator are user friendly, that means, the user is not forced to use the RPN notation. In the computer language this innovation was called “Fortran”, for formula translator. It improves the programming of machines. The programmer were not longer forced to type in assembly codes, but can type in their program in normal notation. The result of the development of the first compilers was, that the idea of stackmachines were no longer needed. A Forth like language is similar to Assembly, and with the first compilers there is no need to program in Forth. The result was, that Stackmacines were a niche topic since that time and instead CISC based cpu dominated the business.

That was no mistake, because the reverse polish notation is indeed hard to master and is not helpful if someone is programming huge number of codelines. I would suggest that non-RPN calculators and non-stackmachines fits better to modern needs in the mathematics and programming. But so called register machines haven't replace stackmachines completely. Today, we have both: the obscure RPN-notation and the modern c compiler plus CISC cpu. I do not think that there is need to decide between them, both domains have their advantages.

The literature from that time is not online-available. Only the bibliographic references are stored in the worldcat catalogue. The fulltext paper of early stackmachines are not scanned in. But that is no big problem, because later authors are citing from the papers and a modern paper written in 1980s and later gives the same amount of information like the original papers in the 1960s.

2 Misc

2.1 What is Forth?

It is easy to define Forth. It is a virtual machine like the JVM, which is programmed in assembly. The VM has the structure of a dualstack pushdown-automata and provides commands like “drop, do .” To run a Forth program on a computer a concrete realization of the Forth VM is needed. Most Forth implementations are written in assembly language. But it is also possible to use C and compile this to assembly to get a Forth system.

At least one word to the performance of Forth. A normal Forth likes Jonesforth is not very fast. If the user want's to run a Forth program, the VM maps every Forth word to a machine language command. It is comparable to a BASIC interpreter. The faster approach would be to use a Forth compiler, such compiler are existing, but they are only commercial tools like VFX Forth.

The average Forth VM is slower than what a GCC compiler can provide. The question is: why are the people fascinated from Forth if the language is difficult to program and slower in the execution? I believe, that 99% are using Forth as an education tool. They are not interested in writing real applications with it, instead they are using Forth like the PL/0 language invented by Nikolas Wirth to understand assembly language and compiler development. For the educational purpose Forth is well suited. The reason is, that the user is forced to become familiar with his computer. For implementing a Forth VM from scratch he must understand normal assembly language for the 6502 CPU, the PIC microcontroller or the AMD64 architecture.

That means, that a Forth VM can be compared with a JVM or the GCC compiler. For programming such software from scratch, the user must understand the computer on hardware level. The question is not, what feature Forth has, the question is what type of knowledge the user must bring in to call himself an Forth guru. For example, the famous Chuck Moore can be called an Forth guru not only he knows who to write a Hello world program in Forth, but he is familiar with assembly language on mainstream computers and even has designed his own CPU from scratch, so he is also an expert in chip-manufacturing.

The funny thing about Forth is, that it stands not in contrast with mainstream computing. I would call Forth not a programming language, but a didactic concept. Let us first define what the problem is. The problem is, that there is a computing class with students in the first semester. And they have absolutely no idea what a compiler is, how to design a CPU from scratch or how existing CPUs like the AMD64 system works. How do we get in a short period of time the students to an expert level? They need a hands-on project in which they learn in a short amount of time a maximum of knowledge. And here comes Forth into the game. It is a combination of a teaching-programming language, a teaching operating-system and a teaching CPU. A possible alternative to Forth is, to use existing “educational only” projects, for example we can combine the Z80 CPU (which is used on most universities in computer courses) with the Minix operating system with the above mentioned PL/0 programming language.

Sometimes, Forth is called an easy to learn language. That is wrong. Because the problem is not to learn Forth itself, the problem is everything what is outside of Forth. That means, understanding what a stackmachine is, on a pen&paper computer is easy, but implementing this on a real cpu is hard. Let us imagine an example project in a computer class: “Implement a forth dual-stackmachine on the 6502 CPU with assembly commands according to the jonesforth specification”. It is a clear task-description and it is possible to do so. But it is not easy going, it is an advanced topic of computer science which has to do with many aspects.

The misconception of Forth is, that the language is comparable to C which means, that after some experience the beginner is able to write programs with it. No, he is not. It is not possible to learn Forth with the aim to use it in reality. Instead Forth is some kind of never-ending-curriculum. The idea is not that the learning progress ends someday, the idea is to go deeper into the subject. If the student is able to write a “Hello world” program in Forth, the next task is to write his own Forth VM. And if he has done so, the next task is to make his own Forth CPU.

Let us investigate how Forth is used in reality. Most companies who are developing Microcontrollers are using Forth in daily life. They have programmed their own Forth IDEs or they buy the license of VFX forth. Why are they doing so is simple. Because companies have employees and employees must be teached. The first thing what the new employee in a company has to do is to attend a course about programming and hardware design. And the lingua franca in this courses is what? No, it is not C, because C can be used without understanding the language. Most c programmers now nothing about compiler programming itself. They know the language itself, and thats it. But, a microcontroller company needs employees who are familiar with lowlevel programming, and knowledge in C is not enough.

The problem for what Forth is used, has nothing to do with programming or CPU design itself. It is more a social problem. The question is how to train lots of employees in a short amount of time to a deep understanding of technology. Forth is the answer. Let us take an example. Intel has produced a new chip generation with around 1000 opcodes. The chip is working great. So Intel has no problems, right? No, Intel has a lots of problems, because somebody in the company must understand how the system works. The knowledge about Intel chips is not public available, because it is a commercial product. Only the employees of Intel are aware of it. And the employees are fluctuating. The question is not, how to program the intel cpu, the question is how to teach new employees of Intel in a short amount of time.

Forth itself has nothing to do with Intel cpus, it is a fictional virtual machine invented for no concrete purpose. But Forth is the right choice if around 1000 new employees have to be trained to become experts for Intel assembly language. Implementing a Forth VM on an Intel CPU is the first project, what the students get. They learn all the commands of the hardware. They reverse engineering a system which is already there. And after the project is over, they have acquired the knowledge.

I do not believe, that we see in future any program which was written in Forth, or any CPU which was designed as a stackmachine. What we see is, that Forth is used to increase the knowledge of people. And this people will write normal C compilers and create normal CISC CPUs.

Forth as universal assembler

Sometimes, Forth is described as an universal assembly language, which runs on all computers. A program, written in Forth for the C-64 CPU runs without modification on an x86 CPUs and runs also on a Microcontroller. But this recognition is wrong. Instead, the real portable Assembler is the C language. C is a programming language used in real projects, for example to write the UNIX operating system which is portable between all computers. Instead, Forth can be seen as an educational language. It is not a portable assembler, but a subject in a computer-science class. That means, Forth fits very well in existing university like structures which are teaching knowledge and evaluate if the students have understand it.

2.2 Forth is the future

According to the TIOBE Index, nobody is interested in Forth. Nevertheless is Forth the language of the future. How can fit this together? As an introduction I want to cite the famous Computer Chronicles episode of 1984 https://www.youtube.com/watch?v=D5osk9lrGNg which presented Forth in television. Since this time, more than 30 years are gone, and the episode looks very fresh. I would suggest, that from now in 30 years in the year 2048 the Forth programming language will also be fresh and new. It is a timeless programming language and the need for teaching Forth has something to do with people. That means, in every time there are people who want to know what a stackmachine is, and how a minimal operating system looks like. Forth is timeless like the MMIX CPU. It was constructed for eternity.

Let us imagine how a potential future of Forth will look like. What will not happen is, that Forth can replace C or C++. The reason is, that Forth is difficult to program and slow in execution. Especially, if the Forth Virtual machine was programmed by beginners it is very slow on x86 hardware. But Forth has a famous standing in computer education. That means in a setup in which a female or male professor is explaining to the students what Forth is, and let them work on Forth related project. Forth is some kind of artificial language which is not used in reality, but in computer courses at the university. And i would suggest, that Forth is for that task superior to other artificial languages like Java, PL/0 or Haskell. If the students should learn a language which is not spoken anywhere else, apart from their computercourse, Forth is the perfect choice. It is minimalistic and it is possible to explain with that language every aspect of computer science like compilers, cpu-design, transputer, robotics and algorithm.

The assumption is, that these subjects are relevant in the future, and there is also a need for teaching the domains to students. So Forth will have a wonderful outlook. I would suggest, that good computer courses in the next 100 years all using Forth. On the other hand, Forth is dead for practical applications. It is not possible to use Forth as a replacement for Intel x86 CPUs or as a replacement for C++. Perhaps we can describe the problem on the Minix vs. Linux debate. Minix is a teaching operating system. It is minimalistic and can be used in computer courses. While Linux is a commercial grade operating system which can be used on real hardware in mission critical systems. Writing an academic paper about Linux is a bit difficult, because it is changing all the time. Instead a paper about Minix makes sense, because it is so static and timeless.

2.3 What is a good Forth implementation?

Forth implementation which runs on real hardware are many out there. I personally prefer Gforth, because it is in the Fedora package manager available but any other Forth system may also be working. The main problem with today's Forth implementations is, that they are programmed for real usage. The idea is to start gforth, type in some Words and get a turnkey application. This workflow doesn't fit to the need of the audience. If somebody wants to implement an algorithm or playing around with a microcontroller a normal C-compiler gcc is much better. That means, the code runs faster, and the code is easier to program.

The needed feature set for a Forth system is different from what the user wants to do with C. A forth system must support the learning progress of his user. It should be constructed as a serious game with the aim to provide knowledge. A Forth system is equal to a Forth “computer based training”. In the optimal case it consists of a pdf book plus an interactive simulator for trying out the knowledge. The simulator has not the obligation to run Forth code exactly or with maximum speed, but with visual feedback. That means, it should be a Forth simulator, like an Assembly simulator. The most sophisticated computer based learning software is called “Softsim” and is part of the GA144 cpu. But it goes not far enough. My own Forth system has more similarities to a game.

Figure 1: Forth simulator

The idea is to program the software not against a certain type of cpu, or use some high-end-compiler optimization. Instead the Forth engine is equal to a game engine. It is primarily a visual application which prints out the graphical stack and the user can move the instruction pointer with keys. The simulator is not ready, it is only a prototype. The idea is, that the user can run the game in single step mode or in a automatic mode, which cycles through the entire Forth program. The Forth system does not exist under a operating system or on a cpu, it is written in C++ and can be compared with a minecraft CPU. That means, the op-codes are doing nothing, they are interpreted from the game, but not by a real cpu or by the Linux operating system.

Let us investigate who usually a Forth system is intended for. Most existing Forth systems has the idea, that the user is programming a real machine. For example, he runs the Forth under his x86 PC, on a microcontroller or on FPGA. The idea is perhaps, that the user is a programmer and wants to try out Forth in real life for making something useful with it, for example let a LED blinking or write an algorithm. I think, the user has no interested in using Forth for programming or trying out new hardware. I think, the user want's to understand Forth on a theoretical basis without programming in the language itself.

The reason why the user think so, has to do with complexity. Forth in real life is very complicated. To program a working a Forth system or write sourcecode in Forth the user must have a deep knowledge of operating systems and hardware. And if an error is happening, in all cases the user has a lack of knowledge. For example, he is testing out the new eForth system on a microcontroller because he want's to let the LED blink. The problem is, that something goes wrong and the reason can be anything. Even the simple task of let the LED blink is too demanding for a new student. What he want's is a clean environment which is observable. In the above screenshot of my SFML based Forth,, no hidden cpu, microcontroller or software is there. The system consists only of the GUI what the user sees on the screen. That means, the Forth has really only two stacks and no virtual stacks or something else.

From a technical perspective this GUI software may look like not sophisticated. Because it can nearly nothing and is not comparable to a full blown gforth or the eForth system. But it has a different purpose. The aim is to be an educational tool only.

The question remains, what the user can do with the Forth simulator. Answer: he can use it for educate other people. Forth is not a real programming language like C. If someone needs an operating system or a programming language he is better suited with Linux. The intention of Forth has more to do with higher education and theoretical computer science. It is a subject of academic papers, but not of working systems.

I want to tell also the problems of the Forth simulator. My program is not able to read a VHDL description. So it is not possible to implement a Forth CPU on the hardwarelevel and simulate it. Instead, the execution engine is a fixed block, realized with if-then-statements in C++. Here is enough room for improvements. The problem is, that I do not how, how logicgates can be wired together to implement a complete CPU. This is usually done in the VHDL file, but it is hard to understand.

The idea of using a simulator is not completely new. There are many “Assembly Language simulators” out there, which looks very similar to my Forth simulator. But they are simulating a register machine, not a stackmachine. The “Marie Simulator” for example, is a Java program which simulates a CPU. It is an addon for a book about computer programming.[null2003guide] Another project is the “Little man computer”, which is also a GUI interface for teaching machine language.[yehezkel2001three]

A machine simulator, dedicated to a stackmachine, is described in [khan1997design] The idea is similar to other machine simulators. The program has not the intention to support engineers in debugging real programs, but to teach students with computer-based-training.

2.4 Making something useful with a Forth machine

My own Forth machine simulator runs quite well. The datastack and the returnstack is filled with numbers, and it is visible for the user, how the game works. But one question remains open – for what purpose? Let us first examine, what the Forth machine can not do. It is not possible to run any other language than Forth on the machine. For example C. In theory it is perhaps possible to write a C to Forth compiler, but the system would not working well. Because a c-compiler works better on a register machine. The better idea would be, to program a Forth machine in the standard Forth dialect. And this results into the above question, why should somebody do so?

Currently, I have no answer to it. As far as I can see from the sourcecode, programming in Forth takes longer than programming in C. The reason is, that the programmer has to use the stack, if he want's to do some useful operating. And writing a program which is using the stack is more complicated than using normal variables or objects from a C++ program.

In theory, Forth is a great language for formulate algorithm in a machine independent format, but most algorithm in computer science can be formulated in normal Python easier. That means, there is no advantage for writing a pathplanner in Forth or implement a longer program in it.

There is another potential reason. Writing a Forth program makes sense, if the aim is to improve the simulator. For testing out, if the simulator works correctly, we need some longer sourcecode. Another reason is, that programs written in Forth are smaller, than in other programming language. I would suggest, that learning to program in Forth helps the students for program in other languages like C++.

What else can we do with Forth? Running a Forth program in a simulator is a very slow task. In my own simulator, not more than one instruction per second is executed, and it is not possible to increase the speed very much. In contrast, todays homecomputer are very fast devices with a highspeed 2 ghz clock. What programmers can learn from Forth is to write small programs, which need not much cpu power. The idea is, to reverse the “Moore's law” into a race to the bottom. The question is, how slow can a cpu be made for running all the important programs on it. The idea is to use computers in the size of the Commodore 64 with the same speed and the same low amount of ram for doing useful tasks with it. With normal c programming techniques it is not possible.

Let us stay on the Commodore 64. There was some time ago a project called Contiki, which is an operating system for the C-64 which includes a TCP/IP stack. A nice piece of work. But, programming in 6502 assembly is very complicated, and the sourcecode runs only on cpus with the same instruction set. The better idea would be to program a Contiki like operating system in Forth, and write a Forth machine for any hardware out there. Or let me explain it from the other way around. The 6502 cpu is no longer in production. And running Contiki on the x86 CPU is not possible, because the instruction set is different. Assembly language is the wrong decision, if someone wants to program a small operating system. Forth instead, is also a lowlevel language but can be executed on any machine.

2.5 Is Forth an elitist language?

Bernd Paysan wrote that Forth is a language for the hacker elite:

“So IMHO Forth is an elitary language, designed and written for the best, or gifted, not for the masses.”https://bernd-paysan.de/why-forth.html

And he is right. Tthe number of people who are using Forth is small. Stackoverflow for example has under the tag “[Forth]” not more than 189 questions online.https://stackoverflow.com/questions/tagged/forth

While C++ for example has around 500k questions. Or to explain it the other way around, if a newbie is posting one single question about Forth, for example “Hi, I've installed gforth but my hello world app doesn't work” he has increased the number of worldwide question about the subject with 0.5%!

But why is the Forth community so small? It has to do with the costs. It is difficult for the people to get information about Forth or buy Forth related hardware. Forth can be seen as an obscure programming language which is not widely known. In contrast, the educational programming language “little man computer" is teached on many universities worldwide. Many simulators and user groups are out there which are researching in detail how to program this simple computer.

I do not think, that Forth itself is the problem. Sure, programming in Forth is not easy, but the language is not more complicate than the assembly syntax of LC-3 (another educational language). I think the bottleneck is somewhere else. The number of introductory books is to low and books which are available for example [koopman1989stack] are not outdated. The book was published 27 years ago, it is a great book but can't be recommended anymore.

2.6 Forth ressources

The major problem in the Forth community is to find information. At first, the number of papers is small, and many of them are behind a paywall. Second, the information are often not logical, that means, author 1 contradicts author 2. So the overall knowledge about FORTH is decentralized, and not easy accessible. Overcoming the problem is simple: emulate Forth in a sandbox.

I mean, Forth itself is very complicated. If the user is trying to program a Forth virtual machine for a certain type of processor for example the x86 CPU he increases the chaos. The better way is not to focus on Forth itself, but to program a game, in which a Forth computer is simulated. The idea is to see Forth as an educational programming language, like the Little man computer. The consequence is, that the problems are different. Because there is no longer a Forth system which runs somewhere, but instead there is only the Forth simulator which is written in C++ and the aim is to improve the simulator.

Inside the simulator it is possible to run a forth program, for example “1 2 +”, but this is not the important part. The idea is not to program in Forth, but to program the environment for Forth. The advantage is, that from day 1 the overall system is working, even if the Forth interpreter is not able to do sophisticated things. For example, my own Forth simulator is very slow (1 instruction per second), and can only execute 3 commands. That means, the system is not able to interpret a standard Forth program. But that it is not a problem, because the simulator itself runs stable. That means, the C++ code compiles, something is shown on the gui, and the user can move the instruction pointer.

The nice feature is, that the overall problem is under control, because the C++ language and the aim of writing with it a game is well researched. That means, the user has no longer questions about Forth direct, but he is trying to improve a normal C++ program. That means, it is ok, if he doesn't understand Forth, because he is not interested in programming the language. Instead, Forth is implemented like a Assembly game. It is something which runs inside another program.

The first critics was, that it is a waste of time for writing a graphical simulator, if a normal Forth interpreter runs much better. And indeed, current Forth systems like Jonesforth, Javascript Forth and Gforth are very fast and can execute all Forth statements. The problem with all these programs is, that they are forcing the user in a certain situation. If someone starts the Jonesforth interpreter, all what he can do is enter Forth code. And because most users doesn't know how to program in Forth, they don't like it. From a teaching perspective, it is better to force Forth into a certain position inside a clean environment. That means, Forth is only a textfile which is executed in a much broader system. And the broader system (the simulator) is written not in Forth but in C++ and has a nice GUI.

According to the given literature this concept is very new. In the Forth community itself, no other project was done into that direction. The idea of using simulation games for teaching a language is only used in the mainstream computing, for example the Little man computer project. Let us investigate this project in detail. The assembly community knows two types of users. Hardcare programmers, which are using NASM like assembler for programming real programs. For example, they are writing a new intro, and they are doing so since many years. On the other hand we have Assembler simulations. That are programs which are compared to the NASM in a weaker position. The compiled sourcecode runs slow, and often it is not possible to make any useful things with it. So called “Assembly language simulators” are used by newbies to learn the language. The common feature are, that the system is per default in the singlestep mode, has a graphical interface and additional lots of documentation which explains what Assembly is.

The Forth community has currently only one type of user: the hardcore programmer which is using Forth since 20 years, has programmed his own FPGA and is fluent in programming Forth. That makes it very hard for newbies to entering the community, because the tools which are there use are not very beginner friendly. Even the lowlevel entry Forth “Jonesforth” is a tool for experts. It has no graphical interface, runs Forth code very fast and is used for creating Forth apps from scratch. That means, Jonesforth is not focusing on beginners.

Let us take a look into figure “Forth simulator” which shows the GUI screen. It is not a console app, but visualize the current Forth system. It has a datastack, a returnstack, RAM and an execution unit. Describing the system is simple: it was created with the SFML library in C++, that means it is a game comparable to pacman. The user can send some keystrokes to the simulator and the game answer with a certain behavior. The rules of the game are called Forth, that means it works like a stackmachine with two stacks. From a technical point of view, the system is not so powerful like Gforth or Jonesforth. It is comparable to an Excel chart and is not a real Forth system. The advantage is, that the user is supported in his learning experience. Even without understanding Forth the user can press some buttons and see how the system works.

Figure 2: Forth simulator

Currently, the simulator has only a few actions. The user can press up, down, left and right. This moves the instruction pointer and executes the command. The simulator works in a way, that it shows what is inside the stack. That means, the debugging mode is activated as default. According to the specs, the Gforth system has also a singlestep debugger. But it is not very pleasant. The same is true for the SwiftForth. Both debugger are created with low priority, in the advance that the user knows the language already from the textbook. But this is not true. The user needs a app because he want's to learn Forth.

2.7 Writing software in Forth

According to most papers about Forth, the programming is done with a divide and conquer technique in which a problem is described with subwords. Most newbies are not understanding the idea and ask for an easier way of implementing Forth code. My impression is, that the language which is nearest to Forth is the assembly language. Coding for old 8bit Atari computers and Forth has much in common. Sure, Forth is something which goes way beyond assembly, because it can run on any machine and it is possible to design special cpu only for Forth, but the programming itself is the same.

Let us investigate a small example how a program is created from scratch. Suppose the user wants to program a bubblesort sorting algorithm. The first step what he is doing is to ask google if someone else has implemented it. Maybe he finds a out-of-the-box solution on http://www.rosettacode.org or in one of the Forth forums. Because the community is very small in most cases he will not find any code. That means he must write the Forth code from scratch. The easiest way for doing so is not to separate the problems into words, but asking the assembly language community for help. The reason is, that programming on a lowlevel for a certain cpu type is done both: in assembly and in Forth.

Let us explain who a typical assembly program works. Usually it has no variables but reserve space in memory. Usually, the programmer is using subroutines, and usually the programming is done different from programming in C. I would suggest, that programming in Assembly for the Atari 8bit cpu takes around the same time than programming for the abstract Forth virtual machine. That means, the user must build up the complete program from scratch. Either Forth nor assembly has any predefined libraries or highlevel constructs like objects. Instead it is up to the programmer to program bare metal.

The good news is, that the assembly community has researched over many years how to program large amounts of sourcecode. Usually the learning curve is high, and it takes many weeks to program a game in assembly. And if an error occurs the debugging is time consuming. The same is true for Forth programming. It is like assembly, only that the cpu has fewer registers.

disadvantages

Let us describe the problems in Assembly and Forth together. The main disadvantage is, that it is a lowlevel language. That means the cpu is executing only the given commands and in a certain order. In contrast, C++ sourcecode is structured more like a textfile. That means the exact order of the statements is not very important instead the programmers describes general structures. That means, larger piece of C++ code can be written without feedback from the compiler. Instead Assembly and Forth programming is programmed against the bare metal. That means that every punctuation is important, because otherwise the system won't work. And i would suggest that most of the time the programmer is working in the debug mode, that means he is using a tracer to follow command after command for verifying that the assembler is doing the right thing.

Let us describe a typical bug. A subroutine is not doing what was expected. But where is the error? The assembler programmer has to go step by step through the commands and observe the content of the registers. For example he is recognizing that the accumulator has the wrong value, he traces the problem back to the command in the listing and modifies the sourcecode. The same error repairing mechanism is used by Forth programmers. The only difference is, that they are not documenting what they are doing. But in reality, Forth programmers are only special kinds of assembly programmers.

Now a typical example how equal Assembly and Forth are. In a stackoverflow thread somebody asks for help because he wants to initialize a struct in assembly language.https://stackoverflow.com/questions/23547328/following-struct-pointers

He describes the problem very well and has also a screenshot of his assembler. For c programmers the question seems very complicated, but it is the normal how assembly programming is done. The interesting point is, that the question is valid for Forth too, that means in a Forth programs the problems for initialize memory for a struct are equal.

I mean, either Forth nor assembly have any kind of datastructure, predefined way for accessing memory or standard libraries. What they are doing is working bare metal. That is the reason why programming in assembly takes so long, while programming in C is faster. The difference is, that assembly programmers work on hardware which can be used by a c compiler as an alternative, while Forth programmers have not the option to fallback to C, because on Forth chips modern C compiler aren't working anymore.

Here another problem. .The task is to reverse a string. According to Roessate code the LC-3 assembly language can used for the task.http://www.rosettacode.org/wiki/Reverse_a_string#LC3_Assembly

The listing contains words like SCAN, COPY, COPIED which have subactions for manipulating the registers. The Forth solution which is also given on the website is using also words (pushstr, popstr, reverse). It is nearly the same solution, except that Forth is bit more academic than the primitive LC-3 command set. In both cases, the sourcecode is not very easy to read. It takes for a beginner at least 10 minute to understand it. In contrast the reverse string task can be solved with normal C++ way more easier.

As a conclusion it makes no sense to compare Forth with C/C++. The languages are to different. But it makes sense to compare Forth with X86 assembly and LC-3 assembly. And I would go a step further and suggest that Assembly experts are also experts for Forth.

Let us answering the question how a complete operating system in Forth has to be programmed. Taking existing Linux sourcecode which is written in C and convert this to Forth makes no sense. The languages are very different, it is another culture. But, it makes sense for orientation on existing assembly OS, for example menuet OS. The sourcecode there can easily be transferred to Forth. It is the same programming style, and the same bare-metal mentality. This can be seen very clear, if we are comparing the debugging screen in an Assembly environment with the debugging screen of Forth. In both cases the programmer is going in singlestep through his application and traces back the register / stack values for identifying a mistake.

The example MenuetOS consists of 1.5MB sourcecode. This shows, that it is possible to write larger projects on bare metal. I would suggest, that also a Forth operating system in the same size is possible.

2.8 Forth emulator

Before I want to describe the situation for Forth, a short look into the 8-bit retroscene which is using Assembly as their main language. What happens since the Commodore 64? A lot, If we are searching github for the term 6502 emulator, we will see, that at least 100 projects are out there. Some emulators are written in C, some in C++, Java, Javascript and C#. The motivation behind such projects is the idea to learn something. Often the programmer have read the 6502 manual and implements the function in a high-level programming language like C++. That means, they are building a computergame which has registers and an Accumulator. The Line of Code is remarkable low what a 6502 emulator needs. Some projects have not more than 2000 LoC and they will run assembly code, which was written for the C-64.

The other way around is not very often done. I mean, to write in 6502 assembly a C++ compiler. In theory, this is also possible, but I found no project in that direction. The reason is perhaps, that writing in 6502 assembly language is more complicated then writing in C++.

Now we will come to the situation in Forth. My own project was equal to the above cited 6502 emulators. The idea was to use C++ and simulate Forth. This task was surprisingly easy. The reason is, that C++ is a very powerful language which can simulate everything: Forth, 6502 assembly, Finite state machines, neural networks or whatever. The other way around was not successful. That means, mean knowledge about Forth was zero before the project, and is zero after the project. That means, it is very easy to write a game in which Forth can be executed, but it is hard to write Forth itself. This is the same situation like in the 6502 emulator case.

Let us increase the difficult a bit. How complicated is the task to implement the ANS Forth standard with a C++ program? Not very sophisticated. The specification is available online and writing a C++ program which takes Forth code and give the result back is easy to create. It is a one man project. If somebody has experience in compiler writing he will mastering it in under a year. The result should be a C++ program which is 3650 lines of code and emulates a Forth system.

But how demanding is the other direction? I mean, to use Forth for writing a C++ compiler, or to use Forth for writing a game? It is way more complicated. The reason is, because Forth is complicated like Assembly, and not very good documented. I would guess, that even a computer expert will not have a prototype after a year of programming and it is unclear if his Forth program will ever run.

Let us increase the abstraction level a bit. .The first fact is, that writing cpu-emulators and compilers for languages is surprisingly easy. The number of github projects from that area is high and the number of needed lines of code is small. Easy means, that such projects can be done in a short period of time as a one-man-project and will result in a success. The second remarkable fact is, that the task is usually done with modern programming languages like C#, C++ or sometimes C.

Another interesting fact is, that after writing such an emulator, in most cases the programmer has not learned the underlying system for example the MOS 6502 and can it program fluently, but what they have learned is the tool-language C++ which was used to write the emulator. That was also true for my own project. The knowledge about C++ has increased a little bit, while the knowledge about Forth remains on zero.

Let us take a look at a typical “6502 emulator project”. According to the list http://www.6502.org/tools/emu/ one MOS6502 emulator https://github.com/gianlucag/mos6502 was written in C++ and has passed the “Klaus Dormann's test suite”. I didn't compile the emulator by myself, but i trust the survey. Let's take a deeper look into the github repository. It contains of 2 files. The .cpp file has 1315 Lines of code and the header file has 189 lines of code. So we see a C++ project with 1500 lines of code. According to the average productivity, the programmer has spend 150 days on the project (10 Loc/day), so it can be called a small nice amateur project.

I'm explaining this in detail, because in the theoretical literature the idea of programming a complete CPU from scratch is described as very complex, and the beginner thinks perhaps, that this will takes years of development and will end with a failure. But in reality, emulator writing is something which can be done as a hobby, and it seems that the programmer had a lot of fun with it. The question which remains open is, why did he used C++ for writing an emulator, why did he not used a Commodore 64 assembler for writing a program in Assembly?

With a bit searching, I've found some Emulator projects which were done in Assembly. That means, the programmer is using x86 assembly for writing a 6502 emulator. But, the number of projects is small, and the sourcecode is difficult to read. For example this one https://github.com/scantysnax/bender_x64/blob/master/bender.asm has 2800 lines of code in x86 assembly.

Why?

The reason why modern 6502 emulators are written in C++ and not in 6502 assembly itself has to do with the term legacy. The Commodore 64 assembly language is recognized as assembly. That means, the people are not longer using it for real projects. That means not, that the C64 or the 6502 is obsolete. The opposite is true: many retroevents are taking place and it seems, that today more people have an C-64 since in the 1980s as the machine was sold in reality. The problem is, that the programmers today are only talking about the machine and are not longer part of the community. That means, there are no real C-64 programmers out there. What they are doing is to place the CPU in a basket, simulate it with perfect accuracy and writing books about it. But what they are using in reality is a Intel CPU and the C++ programming language.

Can we do the other way around? I mean, using old Commodore hardware and 6502 assembly language to emulate modern computers? No, it is not possible. Because fixing C++ sourcecode is easier than fixing Assembly sourcecode. And a PC is faster than the 6502 chipset. I wouldn't call the C-64 death or obsolete, instead it is a legacy system. That means it is a closed system which is used for learning and teaching. And I would suggest that the Commodore 64 is the better computer for community building, than an ordinary PC. I do not know any PC or C++ programming club for example. Apple fan clubs are a lot of them, but I never heard of an Intel fanclub.

2.9 List of Forth CPUs, IDEs and books

year	name	author	note
1962	Burroughs 5000
1962	English Electric KDF.9	Charles L. Hamblin
1985	Novix NC4000	Charles H. Moore	Novix Inc
1988	RTX2010	Charles H. Moore	16 bit https://en.wikipedia.org/wiki/RTX2010
1989	RTX32p	Philip Koopman	32bit, 2-chip
1989	RTX4000	Philip Koopman	planned commercial version of RTX32p
1993	F21	Charles H. Moore	500 MIPS , 15000 transistors
1995	MuP21	Chen-Hanson Ting and Charles H. Moore	2 chip system (second VGA)
1996	i21	Charles H. Moore	500 MIPS, NTSC video output
1998	MSL16	Philip H W Leong	similar to MuP21
1999	C18	Charles H. Moore	asynchronous, SEAforth 40C18
2003	b16	Bernd Paysan
2003	JOP	Martin Schoeberl	Java optimized processor
2004	MicroCore	Klaus Schleisiek
2004	FC16 Forthcore	Richard E. Haskell
2006	SEAforth-24	Charles H. Moore	24 core cpu
2007	eP32	Chen-Hanson Ting	together with eForth
2009	GreenArrays F18	Charles H. Moore
2010	J1	James Bowman	Willow Garage
2010	GA144	Charles H. Moore	commercial product
2012	N.I.G.E. Machine	Andrew Read	softcore cpu
2015	Lund CPU	Girish Aramanekoppa Subbarao

Table 1: List of Forth CPUs

google “site:http://www.forth.org/svfig/ cpu”

http://www.ultratechnology.com/chips.htm

Name	price
VFX Forth	1200 US$
SwiftForth	399 US$
8th	130 US$ per year
Gforth	GPL
F-PC Forth system	free

Table 2: List of Forth IDEs

List of FORTH books

Elizabeth.D.Rather: Forth Programmer’s Handbook
by Elizabeth D. Rather: Forth Application Techniques
Leo Brodie: Starting Forth
Leo Brodie: Thinking Forth
ANS/ISO Forth Standard, http://dl.forth.com/sitedocs/dpans94.pdf
Phil Koopman: Stack Computers: The New Wave, http://users.ece.cmu.edu/~koopman/stack_computers/index.html
J.V. Noble: A Beginner's Guide to Forth, http://galileo.phys.virginia.edu/classes/551.jvn.fall01/primer.htm
Learn X in Y minutes , Where X=forth, https://learnxinyminutes.com/docs/forth/
Forth FAQ, http://www.forth.org/faq.html

3 Performance

3.1 Power consumption

5 femtojoule per transistor [beiu2005sub]
Apple A8X, 3000 million transistors, 4.5 Watt ->1,5*10^-9 Watt/transistor
MuP21 Forth cpu, 7000 CMOS transistors, 100 MIPS

3.2 Speed of Forth

So what's the deal, why is SP-Forth so slow? At first, it is not the fault of SP-Forth. Other Forth systems like pforth or gforth are not much faster. The problem is, that the C++ compiler was designed for practical usage and is optimized for speed. The idea is to generate the fastest binary file which is possible. While Forth was designed as a teaching language. The aim is not to write code in Forth and execute this on hardware, the hardware is to use Forth code in theoretical papers without ever run the code on real machines. Forth is used mainly as a proof.

The idea is not to focus on real hardware or real software, but invent everything from scratch. A forth program is not created for an existing computer like the x86 CPU, but the assumption is, that no hardware is there and we must invent a computer from scratch. And the minimal computer has only two stacks: datastack and returnstack. In contrast, the C++ compiler is progrmmed for real environment. The assumption is, that working hardware is out there, and a working operating system is available. For C++ it makes no sense to build around a virtual CPU, but the idea is to use existing hardware for producing software.

Forth programming can be called a permanent reverse reengineering. That means not to use the work previously done but to invent the wheel from scratch. That means, if someone wants to invent a computer from scratch, write his own compiler and invent his own operating system from scratch, than Forth is a here to stay. If the aim is to use existing hardware/software for writing something new, than C++ is better.

Let us explain this in detail. The assumption under which Forth is operating is, that we are living in the 18th century. The first computer wasn't invented yet and all what we have are punch cards and relays. The aim is, to built a simple machine from scratch, program the first compiler, and write a hello world program which runs. This is a job perfect for Forth. It is possible, to build everything from scratch: the logicgates, the compiler, the machine model and even the programming language. The result will looks a like a computer made of wood and has an overall speed of around 1 instruction per second. The memory is limited. Forth is the slowest programming language ever, but it minimizes the amount of understanding of the system.

Homebrew CPUs projects are not created for the purpose of speed or inventing something new. A homebrew CPUs is designed for replicating existing work. The idea is for example, to build a 8 bit homebrew CPU out of TTL chips. Why would somebody do so? The real 8-bit cpu was invented 40 years ago, recreating this from scratch is done with an educational purpose. To understand how the system look like, and to gain knowledge about hardware itself. This is the main domain in which Forth is used. Forth is the natural language for homebrew CPUs. The resulting working system will be run very slowly (because it is an 8 bit cpu) and Forth will decrease the performance further. The advantage of this bonsai plant is, that everything is under glass. That means it is fully documentated and can be seen as art.

A good example for such projects is the “MARK I Forth computer”http://www.aholme.co.uk/Mk1/Architecture.htm The performance of the machine is extraordinary slow, and the Forth program run on this machines solves no real problem. Instead the project is proof of how to build such computers.

Let us investigate the MARK I computer in detail. What is his purpose? It is not possible to run a webserver on the machine, it is also not possible to run Linux on it. So why invested the inventor so much energy in the project? Because he proofed something. A proof is known from universities. The idea is to repeat something which is already known. For example: everybody is aware that 1+1=2. The problem of how to add to number is solved. But instead go further and investigating more complicated things the same problem is asked as an open question. A possible proof for 1+1=2 looks the following.

At first the teacher asks a question: What is 1+1? Somebody gives the right answer: 2. But that is not the answer the teacher want's to hear. The teacher is not interested in math itself, he is iinterested in testing his students. It is important to ask who has given the answer, and what his motivation was behind it. Making a proof means, to deal with problems, for which the answer is known to build an education system.

With the same purpose the Mark I computer was designed. How to program a computer is already known, but repetition is always good. The running Mark I computer can be used in a quiz. The teacher asks a question, for example, “What should i do to let the led blinking?” and the students must answer the question. The idea is not to discuss about computers, but build a school game, in which people are evaluated.

Formal Forth

“Forth provides us with a simple language with a programming environment and debugger. Due to the simple nature of Forth, this can be formalised much more readily than most other languages (Stoddart 1988)” [knaggs1993practical] page 37

It is a bit unfair to compare Forth directly with C. Because both languages are made for different purposes. C was made as a quick&dirty language for real systems. While Forth was created as an intellectual challenge in theoretical computerscience. Forth can be best compared with the MMIX cpu of Donald E. Knuth. The “MMIX Visual Debugger” isn't use for real programming tasks, it is a clean simulator for trying out algorithm in the university. The MMIX cpu uses lots of register. Forth is more restricted and has stacks. This makes Forth superior to the MMIX. It makes sense, to compare Forth with the MMIX, PL/0 or other educational computers. But it makes no sense to compare MMIX with C or Forth with C.

Let us take a look into “The art of computer programming”. The book-series is equal to a game. It builds a knowledgesystem in which questions can be asked which can be answered wrong or true. That means, first the author Knuth explains what sorting is, and then the students can answer a question to the problem. If he is right, he gets a point from Knuth. The book “the art of computer programming” is created as a whole big proof. That means, nothing new can be found in the volume, but the purpose is to gamify existing knowledge. I wouldn't call the book scientific or academic, it is more didactic. The idea is to use the book for teaching students in computer science. Teaching means, that the students can answer the question which are given by the instructors. Teaching means not, that the students are exploring topics outside of the course or find new answers. Donald Knuth is a wonderful teacher. He is very good in proofing something.

Let us take an example excersise from “The art of computer programming”:

“In a random sequence of a million decimal digits, what is the probability that there exactly 100,000 of each possible digit?” [maclaren1970art]

“Write a MIX program analogous to (2) that computes (aX) mod (w-1).” [maclaren1970art]

In the book are much of the above cited exercises and they have all in common. The answer to the problem is known. At least by Knuth or it is printed in the reference section as the correct answer. The question are well suited for testing students, if they have understand the book. That means, Knuth didn't invented algorithm theory, he invented the SAT test (Scholastic Assessment Test). Somebody may call the book from Knuth as mischief, but it is the opposite. It shows, what a mathematical proof is, and it can be used in a teaching environment. Proof means, to ask for something which is well known. It is some kind of riddle which is meaningless by itself, but defines the relationship between a person who has the power to educate the opponent. The difference is, that Donald E. Knuth can let a student fail a test, while the opposite is not possible. That means, Knuth creates the test for an exam, while his students must attend the test.

3.3 Performance

According to the official Forth FAQ a program written in Forth is around 4-8 times slower than the same algorithm written in C.http://www.forth.org/faq/faq1.txt section #3.5

So what's the deal, why are people using this slow language? Let us first try out the c-language in detail. If somebody is writing an algorithm in C he uses a compiler for producing machine code. Modern C compiler are really good in the task, in most cases the compiler generated machine code is as fast as handprogrammed assembly code. That means, a good c compiler let the hardware running on its maximum. Suppose we have written a primenumber algorithm in C and with -O3 compilerswitch the programs needs on a 2 Ghz AMD64 cpu around 100 seconds until it is finished. How we can improve the speed? With Forth it is not possible, and with a better C compiler also not. The problem is the underlying hardware. If we want to run the same program not in 100 seconds but in 1 seconds we need a faster cpu.

And here comes Forth into the game. The idea is to not use small optimizing techniques. For example, a better cpu architecture or a transputer which has many cpus at once is a big optimizing technique. And this is the reason why people are playing around with Forth. The idea is to not using small optimizing technique with the factor 5, but using big optimizing techniques in the factor of 1000. If we have for example a transputer with 1000 cpus at once, our algorithm will much more profit from it, than any c compiler can deliver.

3.4 Multi-core Forth machine

A possible computer consists of a large amount of main memory, for example 100 GB in DDR4-RAM. The memory consists of cells which containing data and program code. On the memory, many Forth CPUs are operating in parallel. Other modules like caches and registers do not exists. Let us go into the details:

The memory has an address space. It starts with $0000 and goes up to $FFFF. For example on address $1000 the main program starts. The first command is “4”. According to the Forth syntax the number is put on the datastack. .The next command on adress $1001 is also a number “2”, it is put on the stack. The next command at adress $1002 is the plus sign “+”. This adds the two numbers on the stack. What the Forth CPU is doing, or better the pointer of the Forth CPU which points to the Memory is simple. He is executing commands.

Now let us increase the complexity and not only use one Forth CPU but two of them in parallel. It is not possible that they are operating on the same piece of code. This would results into the same problem like in normal computing, for example CPU1 overwrites a memoryadress which is needed by CPU2. But it is possible to define separate threads which runs in parallel. In the Linux operating system this is called a background process. That means, CPU1 is executing code from $1000 to $1200 while CPU is executing code from $2000 to $2040. The threads are separated from each other.

Figure 3: Two Forth CPUs on one tape

Left is the datastack which has [5,3] while right is the return stack. The user can move the CPUs with pressing the arrow keys. He can also start the “fetch-decode-execute” cycle which is starting the normal Forth inner next loop.

If the programmer defines 100 threads, then 100 Forth CPUs can run simultaneously on the memory. Because each Forth CPU is minimal from design it has only 2 stacks: datastack and return stack plus some minor features like an instruction pointer. It is a minimal computer which consists of not more than 1000 transistors. This makes it possible to construct many thousands of Forth CPUs which runs in parallel on a large amount of memory. The only bottleneck is the bandwith of the RAM.

The problem is, that RAM can only accessed by one CPU at the same time. For example: on timeindex 0.1 seconds CPU1 is reading a cell, on timeindex 0.2 seconds CPU2 is reading a cell and so forth. The memory controller manages the access from the cpus to the memory. If the bandwith is low, the overall system will be very slow. That means we have around 1000 Forth CPUs but they can not access the memory at the same time. They have to wait.

Another problem is, that normal C sourcecode will not run. It is not possible to convert existing Linux programs into Forth syntax. And even this would be possible, it makes no sense because a sorting algorithm like Bubblesort has to be implemented different on a stackmachine. That means, the programmers have to rewrite all the software from scratch. This results into new operating system, new video codecs and new applications.

3.5 Why performance benchmarks are useless

Most amateur programmers love benchmark. They compare for example the execution speed of a C++ compiler with the speed of Java. The idea is to evaluate a certain technology with an objective judge and to be on the side of the winner. In the case of Forth a benchmark makes no sense. In many papers about new invented Forth software some kind of performance benchmark is given, but it says nothing. The reason is, that the intention of Forth is not to replace existing CISC CPUs or existing programming languages. If somebody want's to create a Forth benchmark then only with the aim to build a very slow Forth system.

My own Forth language simulator has right now a performance of around 1 instruction per second. This performance is reached with an artificial pause, which is executed in the thread. The normal SFML command was used to wait for exact 1000 ms. The result is, that the overall gameloop runs with 30 fps to get the movement smoothly and for reacting to user input, while the execution of the simulated Forth CPU is slowed down to 1 step per second. Otherwise, the user won't see very much, he would start the program and get a millisecond later an error command, because his Forthprogram is wrong. With an artificial slowdown the user can see in stopmotion what the simulated Forth CPU is doing and how the stack is filled with values.

My plan is to reduce the execution speed further, down to only 0.5 steps per second. The reason is, that in a dualcore CPUs two Forth VM are executed in parallel, and if both are running with maximum speed, it is hard for the user to get both in observation.

The intention of a Forth benchmark is not to run a given program at maximum speed. That is the privilege of production ready programming languages like C++.: They are used for writing real software. Instead, Forth is an education programming language, that means a slow execution speed is better. A similar project is the GNUSim8085 software, which is a tool for learning assembly language. The performance of the software is very slow, but this principle is right for learning Forth too.

GnuSim8085 is not the most sophisticated assembler. And most feature of GAS and NASM he didn't have. It is more a teaching environment, that means it is not possible to write real assembly programs with it, but it is a good tool for playing around with Assembly for people who have no knowledge.

Let us investigate how a dedicated Forth CPUs is presented:

“MISC processors will be much cheaper than RISC and CISC processors, and can compete effectively against them on the basis of favorable price/performance ratio.”http://www.ultratechnology.com/mup21.html

According to this statement the MuP21 is superior to normal CPUs because he is faster and needs less energy. This argument is not very relevant, it misleads the user into the direction, that he can use Forth to replace existing computers. If we want to promote Forth, than we should argue that Forth is slower and needs more energy. That means, a standard x86 PC in CISC architecture is able to run with around 10 Gflops, while a Forth CPU is for the factor 1 0 10 slower. The question is not how to shrink the size of the gate to compete with existing technology, the question is how to build a Forth CPU in hardware which is really slow, and how to program a Forth simulator which is even slower.

According to the specs, the MuP21 CPU was produced with 1.2 micron CMOS process. That is too tiny. What's about a Forth CPU build with a transistor size of around 1 cm? If the chip has 1000 transistors, the cpu would have a length of 10 Meter, which needs a special room for it. The advantage is, that the museum visitor can observe the device, walk around and touch it.

3.6 Forth has a big performance problem

Some papers about Forth are discussing the performance issue. The intention is to make clear, that Forth is a fast cpu architecture which is better than other systems. In reality, the performance problem is not about speed but it's the opposite. Perhaps I should give an example.. My new Forth language simulator has two CPUs who are working simultaneously on the same tape. The user types in the command “Run” and both CPUs are start working. It feels a bit like in a realtime strategy game, in which the user controls two units at the same time. The problem is, that it is hard to observe the scene. Even a artificial timelag of 1 seconds is used between two actions, the scene runs too fast. The problem is that it is easy to follow one cpu, but two cpus at the same time with a separate datastack is too much for the normal user.

How to overcome the problem is unclear, but increasing the performance is the wrong way. The better answer has to do with more timelag. Perhaps I should increase the waiting time from 1 second to 10 second, so that the user has time to see what exactly the stackmachine is doing?

What helps a bit is to not run the cpu in parallel but after each other. Not because of technical requirements, but this makes clear, that the user can follow the process. Perhaps I should describe what the worst case scenario is. A bad idea is to remove the artificial waiting loop, because this results into a not controllable behavior: the user is entering “run”, and without any delay, both cpu are at the end of the program. From the performance perspective that is the fastest mode. But the user sees nothing. It is useless to run the simulator in that mode.

3.7 Prefetching for the UNIVAC I tapedrive

Introduction

The rewind time of the tape drive was around 60 seconds, sometimes this is called “access time”, because it takes one minute until a certain adress is reached. If the CPU has no core memory and no cache, he writes direct on the tape. A command sequence like “read #00, read #A0, read #10”, takes each 60 seconds until the tape is adjusted to the adress. So the Univac needs 3*60=180 seconds for the operation.

Main

The UNIVAC I computer was made in 1951. His tapedrive had a rewind time of 60 seconds. That is also called the access time. 6 tapedrive units can be run in parallel. Let's assume we have a batch task. From tape1 we copy something to tape2. From tape2 to tape 5 and so forth. The exact batch is described in the figure 4. Because we need 6 read and 6 write access the overall time is 12 minutes until the job is done. The term prefetching means, to not tolerate that a tape has nothing to do, instead it is started in advance. From the beginning we give every tape a task and this results into a smaller overall time. It can be reduced to 2 minutes. The average access time is no longer 60 seconds, but it is:

120secondsoverall 12tasks =10secondspertask

Figure 4: prefetching for tapedrive

Can this be true that the virtual rewind time is only 10 seconds, while the physical rewind time is 60 seconds? Yes, the term is called pipelining or RAID and means, that every device is under load, and they are working together. More inexpensive tapedrives will decrease the virtual rewind time further. 60 seconds down to 10 seconds is equal to 1 tapedrive up to 6 tapedrive. That means, if our goal is an accesstime of 1000 msec, we need 60 tapedrives in parallel.

4 Hardware

4.1 Parallelcomputing

Sometimes the Inmos Transputer is called a parallel computer, because it combines many cpus on one chip. But in reality, the so called Harvard architecture is also a parallel machine. The signal pathway is used for connecting subcomputers together. Let us describe a small example. Suppose the assembly command “mov ax,bx” should be executed. For doing so, many substeps are needed. If we are running the steps after each other it takes some time. The alternative is to construct many small units which are handling one piece of work and then we are combining these unit with an assembly line. The assembly line on a cpu is called signal pathway. On youtube an animation of a working Intel 8085 is available which shows the principlehttps://www.youtube.com/watch?v=nABqe5SctHY It looks similar to a Simcity game in which cars are driving around. The most dominant feature is, that the signal are independent from each other. They are running at the same time.

Let us describe the situation with a city. At 09:00 AM person1 is driving with his car to school. At the same time at 09:00 AM person2 is driving with his car to the hospital. At 10:00 AM everybody is on his place. The global clock has only recognized that the process has taken 1 hour. But in reality, the time for driving was 2x1=2 hours. Todays cpu's are described by their speed in Gigahertz. 1 Ghz is equal to 10^9 operations per second. A very good cpu reaches 10 instructions per clockcycle.

Queue machines

[husemannstudy2016] introduces queue machines on an example. We have the formula

5*( 2+8 ) ) − ( 4/2 ) and the idea is to calculate the result in parallel. At first, the abstract syntax tree is generated, and this tree is executed in parallel. That means, 2+8 is calculated at the same like 4/2 be different execution units which are normal dualstack pushdown-automata.

Threading

In mainstream computing the idea of threading is well known. For example, the C++ programming language is supporting this feature as default. The reason why threading is used not very often has to to do with the fact, that according to the hardware only a few cpu cores are available. That means, if the programmer is creating 2 threads, this double not only the performance but also the power consumption of his CPU. But threading itself is the answer to every performance bottleneck. It must be used with many cores at the same time.

Let us first investigate how we can decide if a program can be run parallel or not. There are two options, the first one is called automatic pipelining and it is used on the hardware side. That means, a chunk of assembly code is loaded into the cache of the cpu, and the processor is investigating if a task can be run in parallel. This decision is mostly wrong and the possible performance improvement is only minimal. The other option is to let the programmer decide if his program can be run parallel or not. A typical example is a brute-force solver for the traveling salesman problem. The programmer can define 10 threads, and each one is calculating a piece of the problem.

Let us describe a situation in which threading can speed up the performance. At first we need many cpus, for example 1000. And then we let the programmer decided how to use these cpu in parallel. That means, it is not necessary for doing the pipeline step automatically on the abstract syntax tree of the program instead the programmer decides how to map the work to the hardware.

In theory, the mapping can be done automatically, for example with analyzing the bytecode and decide according to rules which chunk of the code can be executed parallel. But, this advanced computing and for parallel computing it is not necessary to do so.

Suppose, we have 1000 different dual-stack pushdown automaton arranged in parallel. A normal sourcecode written in Forth can't be executed by the stackmachine cluster, because it is unclear if a subroutine needs as input the output of another subroutine. But, if the programmer has decided from the beginning, that all routines can be done in parallel, there is no need for waiting of the results of a previous step. That means, the programmer has scheduled the tasks of his “Traveling salesman problem” manual to the computing cores, and they are starting to calculate.

It is correct, that it is difficult or sometimes impossible to run sourcecode in parallel. The problem is, that in the sourcecode is defined:

input (takes 10 seconds)
calc1 (take 20 seconds)
calc2 (takes 60 seconds)
output (takes 10 seconds)

and the “output” routine can only be started, if the previous subroutines are done. And is indeed an interesting question, if we can analyze the sourcecode (or his abstract syntax tree) in way that we can automatically decide which commands can be executed in parallel and which not. But, this question is only a minor problem in the subject of threading. It solves the question what the cpu can do, if the programmer didn't defined manually what codeparts can be threaded.

The more easier way is, if the programmer has already written explicit, that the calc task can be run in parallel:

input
parallel: calc1, calc2
output

Here is the situation clear. We can assign the calc task to different cpus. But, the programmer will only define task as thread ready, if the hardware is able to run the task with higher performance. Todays computers only have 2 cores, and it makes no sense to specify a complex threading hierarchy.

4.2 Why Forth is better than Intel Xeon Phi

The latest iteration in Intel products is called Intel Xeon Phi, “Knight Mill” According to the specs, the processor can deliver 3.5 Teraflops and consumes 300 Watt energy. At first, we must calculate the Gigaflops per Watt value, which is 11.7. Then we can calculate the energy consumption per flops, which is 85 picojoule per flop (1/a*1000). Now we compare this value with the GA144 forth CPU by Greenarray, which needs only 7 picojoule per instruction.http://www.greenarraychips.com/home/documents/greg/PB001-100503-GA144-1-10.pdf So we have as the result:

Intel Xeon Phi, 85 picojoule per flops
GA144 Forth CPU, 7 picojoule per Instruction

The Greenarray chipset is around 12x more energy efficient than the Intel hardware, which is a lot. So we have proven that Forth is superior to Intel engineering.

4.3 Pipelining

Understanding Multiprocessing and Pipelining isn't easy, but it seems, that the parallel executation of many tasks the key is for increasing performance of a CPU. In the figure “pipeline” an example is given, which is well known from other introductions into the topics. The overall tasks consists of steps like “prefetch, fetch, decode and so on”, and the tasks are scheduled to threads. The first surprising fact is, that a thread is not responsible for a timestep, but a thread is used for run the entire task. What does that mean? In C++ we would create a cycle with the command from figure “Thread in C++”. That means, not every subtask is run by a different thread, but we have only one thread, which has the complete game-loop. The idea is, to run many game-loops at the same time with a small offset.

void cycle() {
  prefetch();
  fetch();
  decode();
  execute();
  free();
}
std::thread t1;
t1=std::thread(cycle, this);
t1.detach();

Let us investigate the table a bit in detail. The main problem of the fictional cpu is, that the subtasks have an order. That means, we can execute something only after the value was decoded. It is not possible to change to order, it is a fixed workflow. Let us watch, how the table has solved the problem. The “execute” subtask is highlighted in grey. What we can see is, that the CPU isn't run the subtask at the same time, but with a slow speed after each other. From the perspective of “execute”, there is no pipeline, because in every timestep one subtask is executed.

Figure 6: pipeline

But, the overall figure shows that all the tasks are running in parallel. We see a virtual CPU which is vertical in the table. The virtual CPU goes from bottom to top and contains all the subtasks: prefetch, fetch, decode and so on. And they are executed at the same time. Let us go further into the details. The constraint is, that the workflow has a certain order. That means, the subtasks “prefetch, fetch, decode and so on” can't be run at the same time, but must executed after each other. “Fetch” is only possible if the previous step was done. Every subtask needs a certain amount of time, for example 1 second. If we have 5 subtasks, with each 1 seconds, then it takes 5 seconds for the overall task, right?

Our aim as the manager of the game is to reduce the duration. Let us compare two options. The first one is the sum of the seconds. 6 threads with each 5 seconds needs 6x5=30 seconds. A look in the figure “pipeline” shows us, that it is possible to run the threads only in 10 seconds, because the last action is ready in the timestep 10. What is the trick?

At first we can calculate the raw numbers. 10 seconds overall time consumption is equal to 1.66 seconds per thread (10/6). That means we have increased the speed from 5 seconds for running a task downto 1.66 seconds. The same reduction is done for each subtask. Prefetch needs no longer 1 seconds, but only 0.33 seconds. Can this be true?

According to the table we have 30 subtasks overall. Each of them needs 1 seconds. On the same time, the measured time for executing the subtasks is only 10 seconds. And that is equal to 0.33 seconds x 30 = 10 seconds. The measurement is correct, the duration of a single task has the normal 1 seconds, but it has also 0.33 seconds. The second reduced duration is the result of the above mentioned virtual CPU. It is something which we have constructed by ordering the task in a pipeline. The virtual CPU needs for the prefetch subtask only 0.33 seconds, and the same amount for all the other subtasks. It is a super-processor which is the result of the coworking threads.

workpackage

In the timestep 5 a batch-job is executed by the virtual cpu. The request contains of the following tasks:

prefetch instruction #5
fetch instruction #4
decode instruction #3
execute instruction #2
free instruction #1

Because the subtasks are dealing with different instructions, they can be run simultaneously. The physical cpu gets this batch-menu. The reason why the subtasks are executed is unknown. Prefetching instruction #5 has nothing to do with “fetch instruction #4”.

4.4 A cache for Forth

Classical mainstream computing contains of RAM, second level cache and first level cache. The idea is, that an algorithm – written in C – can access fast to every cell of the memory. For example, an image with 200 kb is stored in RAM and the CPU is deciding to take parts of the data into his first level cache to run a compression algorithm on it. What is the Forth way of doing so?

At first, a Forth CPU has no first level cache. It has only the dualstack and the memory. Access to the RAM is expensive and storing large amount in the datastack is not possible. How does a Forth CPU stores large amount of data? This is unclear, perhaps it is not possible. Let us watch a real Forth CPU like the GA144. The total amount of RAM on the chip is small. Instead the system is optimized for small amount of data. For a normal algorithm this could be possible, but what is in the case of large image files, which are need 200 kb and more of storage. Where do we are storing them?

At first, let us investigate an option to overcome the problem with traditional methods. The need is to store 200 kb near the CPU, so that the CPU can access the data fast. Doing so is possible with a first level cache plus a register based machine. The registers of a CISC machine can be seen as the zero-level-cache. It is the fastest sort of memory direct on the chip. So the answer to the storage problem is not using Forth and switch to a modern register cpu including a cache.

This is what we see in reality, but perhaps it is possible with Forth to give another answer. A stackmachine is per definition not a registermachine, that means, we can not add registers or a cache to it to improve the memory. And using a stackmachine together with a big RAM makes no sense, because the access to the RAM is slow. The Forth way for solving the problem is to use an array of stackmachines. That means in reality. A standard Forth CPU consists of a datastack which is not deeper than 5 bytes. That is the total amount of RAM the system has. If we need more RAM for storing a 200 kb image, we need more Forth CPUs. 40000 single Forth CPU with each 5 bytes = 200 kb RAM. Such a system is called in literature a stackbased “Cellular Automaton Machine”.

This sounds a bit complicated so a short example may clarify the point. Suppose the image we want to store is 200 kb in size. We are divide the image in 1000 parts, each is 200 byte long. The 200 byte are the memory of a Forth machine. Additionally the CPU has 5 bytes for the datastack and 5 bytes for the return stack. So we have 1000 mini computers which have access to a different part of the memory. Each Forth CPU can read/write only 200 bytes. This is some kind of intelligent RAM. The only problem is, how to program the tiny little devices?

4.5 Forth GPU

The topic of a graphic card is ignored in the Forth community. The reason is, that a standard microcontroller doesn't have such a device. But from the perspective of understanding Forth it makes sense to focus in detail on the topic. The application of visualizing pictures has a lot of interesting challenges. At first, the usually graphics card RAM is big in size. A standard display has at least 1 million pixel plus color information (= 1 MB). If the graphics card should render 3d graphics the number of needed pixels is higher. Also the painting of shapes likes lines is very computational intensive. That means, if someone believes to understand computing and want to minimize the hardware, than building up a graphics card from scratch is a good place to start.

[morreale1984general] is one of the few paper who are describing Forth together with graphics processors. Finding such information is difficult, because most papers of the 1980s are on microfiche and not online available and searching for the term “Forth” brings in most cases no useful information. Another example, which uses the GA144 cpu for graphical output on a watch is given on github.https://github.com/mflamer/GA144-watch

5 Examples

5.1 Implementation

Visualizing a dual-stack pushdownautomaton is relativly easy with the SFML game engine. In the following figure the first prototype can be seen. Instead of programming another “Forth like system” on a x86 cpu this project has the purpose to program against a graphic-display. That means, the Forth VM is only visible onscreen as some kind of interactive game. The user can so far control the instruction pointer with left/right button, and he can even push some values to the datastack. It is done with a textcommand. The idea is, that the game can interpret a normal Forth program and run it in singlestep mode.

Figure 7: Stackmachine in SFML

5.2 Testing out SP-Forth

In the following section, I want to describe a concrete Forth system. I've choosen the SP-Forth systemhttp://spf.sourceforge.net/, because it has the feature to compile Forth code into native x86 machine code, which improves the performance. The first step is to download the 744 zip file for the Linux operating system. After unpacking lots of documentation, examples and the SP-Forth executable file is visible. The file “./spf4orig” is 64kb in size and after executing it, the screenshot “SP-Forth” is shown.

Figure 8: SP-Forth

Instead of the Gforth system, SP-Forth needs the command in big letters. A demo program can be:

: MAIN 10 0 DO I . LOOP ;
 Ok
MAIN
0 1 2 3 4 5 6 7 8 9  Ok

A problem with SP-Forth is, that the documentation is only available in Russian. One advantage is, that it supports multitasking out of the box. But let us try out something different. We want to run a program from a file. At first we need the same Hello World program in a file “hi.f”, with a small modification and then we can execute the program together with SP-Forth. The result is the same, the numbers from 0 to zero are printed on the screen.

# hi.f
: MAIN 10 0 DO I . LOOP ;
MAIN
BYE

# commandline
./spf4orig hi.f

A speed comparison between SP-Forth and C++ is always possible, and the SP-Forth system is way more slower.

sp-forth:
: SUM 
0 1000000 0 DO I + LOOP
.
;
: MAIN 1000 0 DO SUM LOOP ;
MAIN
BYE

time ./spf4orig sum.f
real	0m2,435s
user	0m2,425s
sys	0m0,005s


C++:
#include <iostream>
void sum() {
  auto result=0;
  for (auto i=0;i<1000000;i++)
    result+=i;
  std::cout<<resulti<i<" ";
}
int main()
{ 
  for (auto i=0;i<1000;i++)
    sum();
  return 0;
}

g++ -O3 sum.cpp
time ./a.out
real	0m0,137s
user	0m0,132s
sys	0m0,004s

6 Other languages

6.1 BCPL vs Forth

In the late 1960s a major decision was made in history of computing. It was a showdown between the Forth system and the BCPL language. The mainstream in computing decided, that BCPL is a here to stay, while the Forth idea was ignored.

Let us go back in time and investigate what the difference is. BCPL was the predecessor to C, and C evolved into C++. Also BCPL is a programming language dedicated to register-based cpus. In contrast, the Forth VM was oriented on a stackmachine and evolved into factor and joy and modern Forth2012 standard. The contrast between both developments is massive. While C like language are on top of the TIOBE index, is Forth usually on the last place. Both developments can be seen as contradiction.

At first, i want to describe the idea behind Forth. The Forth VM is using an abstract computer, called two stack pushdown automaton. It has a tape, a datastack, a return stack and an execution unit. Programming in Forth is equal to use dataflow programming for the Forth virtual machine. In contrast, the idea of BCPL was to define also a virtual machine, but the BCPL virtual machine is using 5 registers, according to the orginal paper: Accumultor, some index register, programcounter and so forth. The idea of the BCPL virtual machine evolved into todays computer hardware which is called a CISC architecture. Such systems were programmed in C like languages like Python, C, C++ and Java.

Which architecture is more powerful? For answering this question let us describe what the disadvantages are. Suppose we want to program in a C like language with lots of variables and compile the sourcecode to a stackbased processor. It is very difficult to do so. Mapping a high level programming language onto a stackmachine is not possible. Instead what the hardware manufactorers are doing is the other way around. They are modifying the cpu to support highlevel programming languages. That means in reality they are increasing the number of registers to at least 32, and they are increasing the number of machine opcodes for supporting multimedia and other operations.

Now we want describe in which the above cited CISC architecture is not very good. If the aim is to research computer programming and hardware design on a formel level, than are languages like C not the best idea. The reason is, that it is not clear how many registers the cpu has, or what the standard of a language is. A c like programming language can be anything, and a BCPL like virtual maschine too. It is not possible to describe algorithm on a formal level.

What we seen today is both. Computer scientists are loving Forth based stackmachines, because they are so simple and can be seen as a runtime version of a turing-machine. A virtual machine in stackmachine architecture is a general computer and is well suited for analyzing the runtime behavior of an algorithm. It is also possible to create new hardware with it.

On the other hand, C like programming languages and the underlying CISC based hardware is very common in realiy. They can be used for implementing real operating systems like Linux and write software in business environment.

The major difference between BCPL and Forth is, that the BCPL architecture has evolved over time. Today, the C language looks different and the CPU which is executing the commands too. That means, the AMD64 architecture is very different from the 386'er systems from Intel. And todays C++ is another language than the earlier BCPL language. In contrast, Forth like systems looks the same over decades. In the 1970s the Forth virtual machines consists of two stacks and today nothing has changed.

In bad literature, not a stackmachine but the turing machine is described as a universal model for a computer. But a turing machine is less important. The problem with turing machines is, that the program code is separated from the tape. Usually, the tape represents the input and the runable code is located in the head of the turing-machine. A second problem with turing-machines is, that the code in head is not changable, it is a Finite state machine. In contrast, the Forth like VM is the better model. It can also run every algorithm but can programmed with subroutines and variables.

Often there is a misconception out there. The idea is, that it is possible to program in Forth like in C++. The idea is, that Forth is simply another programming language. But it is more. Forth has more in common with a dataflow language for a abstract machine. That means it is not possible to program Forth without interactive. Perhaps an example:

A C++ programmer is oriented on the sourcecode. He writes down 2 pages of sourcecode like he write a text. And after 30 minutes, he presses the run button to see what his program his doing. It is batch workflow, and if the programmer has experiance, the time between every run of the program is longer. That means, the programmers sees only from the sourcecode what his program will do. He uses the compiler only for error checking.

In contrast a Forth programmer works interactivly. What his machine will do is unclear, because new defined Forth words are not predictable. The idea is not to program something, but to prove if a program is correct. Let us describe the difference in detail.

C++ programs are written because the user needs software. He writes for example a game, and if the sourcecode is complete the computer can do something interesting. That means, C++ programming is done only for creating sourcecode. The idea is to increase the number of software.

Forth programming is done with a different goal. Forth is mainly a teaching language. The idea is not to program with Forth a database application or a game, the idea is to use Forth as a tool for educating students about computer science. That means, the typical Forth projects has not produced anly lines of code after 1 year, but around 100 students with a diploma in computer science who are familiar with hardware and algorithms. Instead, C++ is not very well suited to teach students in computerscience. C++ is a language for the practice. It has no clean design, and is driven by learning from doing.

Anti pattern

At the end of the small comparison between BCPL and Forth I want to describe a behavior which results into failure. It is an antipattern which is ignoring the advantages of a language. What the user is not recommended to do is using Forth as a programming language for creating an operating system or a GUI application. Also not recommended is the idea, to use BCPL for teaching computer science at the university. It will fail also.

BCPL can be described as a language for writing computer sourcecode, while Forth is an education tool for teaching students. That means, if a professor at the university, uses Forth to explain what a compiler is, then the professor is really good in his subject. And if a programmer in daily life is using a BCPL like language for creating software, then is also an expert in his domain.

6.2 C++ vs Forth

The difference between Forth and C++ can not be greater. Both languages are very old. The development of C++ started in the 1970s with the BCPL compiler which were later extended with Simula-like features. The stackbased Forth language was developed in the same time. Both languages can be seen as unique, that means they are very powerful. But it is not possible to switch between them, because their philosophy is different.

At first I want to describe the mindset between C++. C++ is a very fast language, which compiles on standard-processor into productive code. The OOP features of C++ makes it easier to program large applications, maintain them and use many programmers in the same project. That makes C++ the number 1 language for game development. A typical game consists of 10000 different sprites, which different behavior which are interacting with the game-engine. C++ handles this complexity and allows the programmer to create sophisticated programs.

The Forth language was designed with a different purpose. Their main application is in science-classes at the university. Forth is a minimalist language. It is possible to explain the working of a complete CPU, an operating system and a compiler with Forth. The idea behind Forth is not to grow program complexität, but to reduce it. If a Forth program has 2 kb in sourcecode, a much better program occupies only the half. The advantage of Forth is, that researcher can try out on a theoretical level many kind of architectures. For example, if they are parsing Forth sourcecode into an abstract syntax tree (AST) they can investige which parts of it can be run in parallel. Creating the abstract syntax tree for a C++ is way more complicated.

The best practice method is to use Forth and C++ together. C++ as a compiler for creating real applications. And Forth as a research plattform to play around with new algorithm, computer architectures and compiler designs. Let me give one example. :Suppose the idea is, to make a performance test of a new parallel computer. If we are ignoring Forth and focus entirely on C++ and Intel CPU we would take a standard-C++ program which has 100 MB sourcecode, a standard Operating System like Linux and a standard X86 Processor which has 5 billion transistors. We are wiring all these compentents together and investige who to run it in parallel. Will this setup works? Never! There are to many components in the loop, it is a mess. The better approach is, to write in Forth a small CPU, program a operating system from scratch and run some applications on it. The overall project has around 2000 lines of Code in Forth and it easy to inspecting every single piece of it. If something goes wrong it is possible to go down to the transistor to search for the failure.

Let us investigate some antipattern. If someone is trying to use Forth for build a production application with 1 million lines of code, he has made a mistake. The project will fail horrible. The reason is, that Forth do not support OOP. Another problem is, that the performance of Forth is poor. Another Antipattern is, to use C++ in a computer course for explaining what a compiler is. C++ is to complicated and the knowledge right now is in 6 months obsolete.

Forth advantage

Let us watch a homebrew CPU project which isn't aware of Forth.http://www.megaprocessor.com/architecture.html is a CPU entirely made of transistors. The system was build for educational reasons, and perhaps the inventor had a lot of fun with it. According to the schematic diagram it is a register based machine, which is equal to modern computers, which also have registers. The result is, that the above cited homebrew project is more complicated than it should be. The instruction set is high, and the wiring to complicated. Forth can solve the issue. Let us investigate the idea in detail.

A minimal computer uses fixed logicgates and it is not programmable. The term in the literature for such device is Finite state machine. That means, the transistors are ordered to a logic circuit and a input signal results into an output signal. Such device can not be programmed, so it is not a real computer. The next step after a wired Finite state machine is not a register based computer, but between them comes the Stackmachine. That is a minimal computer which is described by the Forth community in detail. It is like a FInite state machine, turing capable, but can execute a program. The program can be changed so it is possible to program the device.

The open question until now is: how to build out of transistors a stackmachine which runs Forth? The answer is not given. Most homebrew projects are devoted to register based cpu architecture. But Forth can provide a realy minimalistic device. In that sense, that it is not able to build a computer more minimalistic as Forth. The Megaprocessor consists of 15300 transistors for the CPU and additonally 27000 transistor for the RAM. The overall transistorcount is higher than it should be. The J1 Forth CPU for example, but i do not need to explain the details here.

Let us first describe, what in the Megaprocessor project was done right. The general idea to build a cpu from tranistors in an educational computer is quite good. So the students can understand what a computer is on a very basic level. What was done wrong in the project, is the architecture of the CPU itself. A switch from a register machine to a stackmachine would increase the educational value for the public.

I think the overall question can be answered with the transistor count. The intel 4004 for example had around 2300 transistors. I would suggest, that it is possible to reduce the amount.

Forth disadvantages

After some applause for Forth it is time to describe the disadvantages. Forth was inventend around the year 1970, until now it wasn't used on real softwareprojects like Operating systems, games or network programming. Instead a different architecture was successful in the mainstream, which is called register-based CPU plus C programming language. The reason why has to do with the need for bigger software projects. The major problem in computing is not to improve the hardware, or make it more faster. The major problem is the software crisis. That means, around the 1970s modern computers from this time had enough RAM and enough harddrive capacity. The problem was, that no software were available. And until today, this problem is not solved. That means, a cheap PC brings around 4 GB RAM with it, and 200 GB harddrive, but how to fill the space?

The computing history has answered the problem with three ideas: compiler languages like C, objectoriented programming like C++ and Service-oriented architecture. The idea is to abstract from a concrete machine and separate between hardware and software. Let us investigate how compiler languages like C are implemented on a CPU. Programming such compilers is done on register machines. They have special registers and context-switching possibilities. The concept is called “compiler friendly” and means, that the CPU has to adapt to the compiler with the purpose to support high-level-language programming. A Forth system can't adapt to anything. It is not possible to run a C program on a stackmachine. That is the reason, why Forth is not used in real software projects, especially not in big softwareprojects which are working with object oriented programming.

Is it possible to create large applications will million lines of code in Forth? No. Is it possible to replace large projects with smaller programs? No. The reason why the Linux operating system consists of million lines of code and not only of 10000 is, because Linux is so powerful. What we have seen in computer history is a double strategy. On the one hand, we had register-machines, C, C++ for real life big projects, and on the other hand the Forth language was developed into a teaching language for simplified lesson in cpu architecture.

Let us describe the above cited service oriented architecture in detail. On a technical level it is equal to a REST services on the internet and to a abstract class on a local machine. The basic idea is, to create a layer which hides complexity and provide certain interfaces to the outside world. For example, somebody has written a 10000 lines long program which is doing something. Service oriented architecture means, that he programs on top of the program in addition 400 more lines of code, which have the only reason to provide a clean number of functionnames. The additional 400 lines of code have no direct task, they are from the program itself useless. But there are important if the program is used as a library which works together with other libraries. Such feature is enabled in the C++ framework as default, and it is the reason why the language is used so widespread. And yes, such a feature results into bloatware. That is software which is large, and is doing nothing.

6.3 C Compiler for Forth?

[shannon2006ac] describes a C compiler which targets a stackmachine. The idea is, that writing programs in Forth is difficult and it would make sense to use existing C sourcecode and run a Forth machine. Doing so is not easy, and the paper has failed to realize such a compiler. But let us go into the details.

The compiler which was used is LCC. The reason is, that LCC makes it easy for writing a special backend for a stackmachine. The idea is, to make “stack schedulinig”, that means to use the stack like register and put C variables on the stack. The problem is, that in reality a stack and a register are different things. For example, if a variable is put deep in the stack on position 12, it is not possible to access the variable anymore because according to the definition only the top element of a stack can be retrieved.

The problem is on the side of the c programmer. He writes perhaps a subroutine which uses 5 local variables, for storing temporary information, the result or whatever. The compiler is trying to convert the sourcecode into machine code. On a register machine it is not a problem, because the local variables are stored in the registers which makes it fast to access them. On a stackmachine such registers are not available. What the compiler can do, is to to store the variables in the memory. But, if the CPU wants to access it, it takes to much time.

To make the point a bit clear: It is not possible to write a c-compiler for a stackmachine. It will be always slower. That is the reason why mainstream computing is devoted to register machines, because such machine fits better to c programming and C++ programming. Apropos C++. How difficult it is to map objects to a stackmachine? Really difficult.

What is the alternative? The alternative is not use existing C programcode but writing new sourcecode in Forth. Forth subroutines are using usually no local variables and they run very fast on a stackmachine. There is only a minor problem: the number of sourcecode written in Forth is small.

We can see the lack of C compiler for stackmachines as an advantage, because it results into an environment which is different from mainstream computing. In productive softwaresystem everybody is using C like languages which runs best on x86 CPUs. And for educational purposes the other way around is the gold standard. That means to use a stackmachine together with Forth. The absence of C compiler marks the border. There is a difference between Forth and the rest in computing. [shannon2006ac] has proven, that it is not possible to sit in between.. The user must decide between Forth and C.

6.4 Beginners are chosen Assembly language, experts C++

It is a surprising fact, that people who have no programming experience at at all are preferring Assembly language for their first trial. The expectation might be the difference, because programming in Assembly is called demanding and writing an operating system from scratch is outside the scope of amateurs. But in reality, many of them are doing exactly this. The reason is, that from a certain point of view, Assembly is a beginner friendly language. The ordinary program is very short and is optimized for a certain type of architecture. If someone has invested one week into reading some books about Assembly, he can start writing his first graphics demo and even program his first game.

There is only one problem: Assembly is not used by professionals. The reason is, that they are not interested in writing short programs for a certain CPU type. Instead the profi programmers is writing longer programs with million of lines of code, and he is doing so under existing operating system. This results into a different programming culture. That means, apart from the stereotype C/C++ is the hardest programming language ever invented. Not because it is so difficult to store a number in a variable, but because of the problems, C++ programmers are trying to solve. They are not interested in getting 10% more performance out of a certain loop, instead C++ programmers try to fix a database application or the write a game against a given game engine.

In contrast Assembly programming has very much to do with the language itself. Which instruction set is used, how to iterate through a loop or how to make certain tricks on AMD64 PC. Both types of programmers are interacting with the computer, but Assembly programmers are doing so in an beginner friendly version. That means, they are not solving any real issues, instead they are putting graphics on the screen or they playing around with the interrupt.

References

[beiu2005sub] V Beiu, J Nyathi, and S Aunet. Sub-pico joule switching: High-speed reliable cmos circuits are feasible. In Intl. Conf. Innovations Info. Tech.(IIT05), 2005.
[husemannstudy2016] Jörg Husemann. A study of queue machines. 2016. https://es.cs.uni-kl.de/publications/ datarsg/Huse16.pdf.
[khan1997design] Fitratullah Khan and Sohail Anwar. Design and construction of a pc-based stack machine simulator for undergraduate computer science and engineering courses. In Frontiers in Education Conference, 1997. 27th Annual Conference. Teaching and Learning in an Era of Change. Proceedings., volume 3, pages 1234–1238. IEEE, 1997.
[knaggs1993practical] Peter J Knaggs. Practical and Theoretical Aspects of Forth Software Development. PhD thesis, University of Teesside, 1993.
[koopman1989stack] Philip Koopman. Stack computers: the new wave. 1989.
[maclaren1970art] M Donald MacLaren. The art of computer programming. volume 2: Seminumerical algorithms (donald e. knuth). SIAM Review, 12(2):306–308, 1970.
[morreale1984general] Jay Philip Morreale. A general purpose graphics processor. 1984.
[null2003guide] Linda Null and Julia Lobur. A guide to the marie machine simulator environment, 2003.
[shannon2006ac] Mark Shannon. Ac compiler for stack machines. Masters thesis, University of York, United Kingdom, 2006.
[yehezkel2001three] Cecile Yehezkel, William Yurcik, Murray Pearson, and Dean Armstrong. Three simulator tools for teaching computer architecture: Little man computer, and rtlsim. Journal on Educational Resources in Computing (JERIC), 1(4):60–80, 2001.

EOF

January 27, 2022

Teaching stackmachines with a slow Forth language simulator

Table of contents