January 28, 2022

Top down Artificial Intelligence with evaluation functions for different games

Top down Artificial Intelligence with evaluation functions for different games

Manuel Rodriguez, January 28, 2022

The classical understanding in robot programming is to imagine what the robot has to do next. He is embedded in a game and has to provide the best actions. This kind of bottom up understanding prevents that the game in general is getting understood by machines. The more elaborated attempt in creating artificial Intelligence is working with virtual referees which is also known as top down AI. The main ingredient is an evaluation function to judge about the current state of all the players in the game.

Tags: Artificial Intelligence, bottom up AI, game AI, scoring function

Table of contents

1 Artificial Intelligence Philosophy

1.1 Intelligence equal to winning a game?

In a colloquial sense, intelligence is very well defined. It’s the ability to win a game. Suppose somebody is able to win a game of chess, this is seen as a desired skill. People are playing chess because they want to win the game. The same is true for other games like scrabble or the Rubiks cube puzzle. Winning the game is the only purpose all these games have. And what humans likes is reaching this goal.

The interesting but seldom explained fact is, that “winning a game” is an easy task for computers. In case of chess, the computer has to evaluate the current position and plan some steps ahead into the future and this allows the machine to win every single game. Even if the computer starts with a weak position, he will beat the opponent for sure if there is a path available.

That means especially the highly wanted ability to win a game is provided by a computer as a default. There is no discussion available if computers are able to win a game of chess. They are able to do so and if someone tries to prove the opposite he can try to lay a game against a chess engine. But, if winning a game is so easy, what is about Artificial intelligence?

Suppose computers are able to win a game but they aren’t able to invent such games from scratch. Somebody has to program an evaluation function for chess first then the computer can use its computational performance. It seems that the isolated goal of winning games is not sufficient to describe what artificial intelligence is about. It has to do with creating evaluation function and selecting which seems to be important. For example, some decades ago, the game of chess was chosen as a relevant game for computers. And the chance is high, that in the future other games which are more complex are getting played by computers.

1.2 Artificial Life

Since the mid 1990s, the term Artificial Life was introduced. Most newbies believe, that Artificial Life has to do with creatures in a 2d world who are searching for food. And then the brain of these creatures is improving by itself. The AI life robots will become more intelligent by its own.

The main problem with AI life is, that the term isn't defined precise. It can mean anything or nothing. In most cases, it has to do with Artificial Intelligence in an ecosystem. And the question is how to program robots in such a way.

A possible alternative interpretation is, that AI life doesn't provide the answers but is trying to analyze certain problems. One of them is the relationship between robots and their environment. An interesting but seldom asked question is "Where is intelligence located, in the single individual, in the swarm or in the environment?"

Let us try to analyze the issue from a technical perspective. It's possible to write a simple video game in which some creatures can move and search for food resources So it is some sort of AI life simulator. What the creatures will do is simple to answer: they will do nothing, because they weren't programmed. To make this simulation more lifelike some sort of energy, intelligence or something else is needed.

The first attempt is working bottom up. The thesis is, that intelligence is located in the individual. So each of the lifeforms in the game will need a brain and the brain can make decisions. This principle is equal to behavior based robotics, in which a single robot perceives the environment with a sensor, plans something and then acts in a way that the situation gets better.

The problem with bottom up AI is, that it remains unclear what the objective is. The individual robot has to take a decision, and the goal is also determined by the single robot. That means, the AI problem is referenced back to itself and it remains unclear what the robot has to do next or what the goal is. The more sense making approach is working top down. Not the single robot stands in the focus but the game which is represented by the environment. Because this approach is discussed only seldom in the literature it should be explained in detail.

Top down AI life means, that the robot creatures in a game are less important. The assumption is, that these creatures are acting randomly or are tele-operated by humans. What is important instead is the game mechanics of the environment: Where is food is located in the map, how the robots can act, which action results into a positive reward and so on. Basically spoken, the idea is to design a game which makes fun to play.

If a game was invented with this principle in mind it can be played by:

  • a random generator
  • a human player
  • an algorithm
  • a neural network

The most antagonist approaches in playing a game is a random generator on the one hand who has a low winning probability, and a super-human-AI on the other hand, who will win every single game. For the game engine, both ways are equal. A player in the game is simple somebody who can take decisions and the outcome gets simulated by the game engine. The problem of decision making in a game can be postpone to the external entity.

Let us go back to the subject of Artificial Life. In the past, AI life was mostly defined as simulating characters for example a robot in a maze. But the definition can be flipped into the opposite easily. The new approach is to define that the maze is the artificial life, and the robot in the maze is an external stakeholder who isn't intelligent.

This definition may surprise because a common assumption is, that player in a game are able to solve the game. But this is not precondition. A player can be someone who is only a random generator. If the robot in a maze is controlled by a noise generator he remains a player, even if his action aren't making much sense.

In a previous paper this problem was recognized as the difference between bottom up and top down robotics. Bottom up robotics says, that the single robot needs to be intelligent, while top down robotics explains, that the game engine needs to be intelligent. Instead of arguing which of the perspectives is the correct one, it should be mentioned that both are available and they are defining the same situation different.

There are many arguments, why top down robotics might be the better approach to explain artificial intelligence. The reason is, that from a technical perspective it is pretty easy to win a game, while it is hard to invent new games. Programming a robot that is able to win a game is a trivial case, because all what is needed is to search in the game tree. And computers can do so very well. In contrast, creating a new sort of game engine which acts as an environment is much harder to realize.

Suppose bottom up robotics is a here to stay. Then the overall problem is how to program a single robot, so that this robot will do something useful. The assumption of bottom up robotics is, that this problem is complicated. But is this really the case? Suppose there is a robot on a graph who should find the shortest path to the goal. Nearly all programmers in the world are able to write an algorithm who is able to do so. Such an algorithm (=the source code in Python) will need around 100 lines of code, and gets executed on a desktop pc in under 1 second. So we can say, that winning such a game is not an AI problem, but it is a trivial computer science problem.

In contrast, a problem which is hard to solve is to write a game engine. That means, a piece of software which holds the maze, and which defines which actions are allowed in the maze. This is colloquial labeled a robot simulator. Such a simulator can't be realized in 100 lines of code but it will need much more effort. And if the simulator should provide a more complex game, the needed amount of code lines will grow exponential.

1.3 From bottom up AI to top down models

A computer is a machine which can do anything. But before the machine will solve a problem, it has to be programmed. Programmers are able to create such programs but they need to know what sort of software is needed.

In the domain of AI, there are two major schools available which are bottom up Artificial Intelligence and top down AI. Bottom up AI means, that the intelligence is located within the robot. This paradigm gets ignored next and the section tries to explain only what top down AI is.

The first thing to know is, that a computer can win easily any game if the computer knows the cost function of the game. A cost function is a mathematical model about what the goal is. The cost functions grounds the reality into numerical values. Most users are familiar with cost functions because in video games the current score is shown on the screen. If the player collides with an enemy, the energy level is reduced and this is shown in the status bar .

Suppose , a robot knows in advance at which situation in the game he gets a positive reward and in which a negative reward is given. Then, the robot's computer is able to plan the actions to maximize the overall reward. That means, the robot has won the game if the score is maximized so it is an optimization problem. In such a situation it's pretty easy to define what AI is about. Because maximizing the overall reward is no longer an AI problem but it's a normal computational problem which can be solved with software. There are many algorithms available for this purpose and they can be implemented in a programming language.

The only bottleneck is a situation in which the situation is unknown which means that the cost function is missing. Without a cost function, it is not possible to plan actions and without a plan the robot isn't able to win the game. In such a case an AI problem is available because it remains unclear how to program the robot.

Top down robotics puts a strong emphasize on cost functions. The idea is that intelligence is never located within the robot but intelligence is equal to ground a domain in a cost function. That means, not the decisions of the robot are intelligent but the processing of the cost function. With this paradigm in mind it is possible to locate where the Artificial Intelligence is located. It is the cost function but nothing else.

The art of creating cost functions is usually treated as game design. Game design means to invent a game and define at which moment in the game the player gets rewarded. This is perhaps the most obvious difference to bottom up robotics. Bottom up AI tries to play an existing game, while top down robotics assumes that the game has to be invented first. Games are usually located outside of human players and outside of robots. A game isn't defined in the player's brain but it is equal to the environment. For example, a stairs can form a game and the robot has to walk on these stairs. if bottom up robotics tries to understand embedded intelligence the idea of top down robotics is to define what environmental intelligence is.

The idea of a cost function is to translate between a game and the actions of participants in this game. The cost define what a single player has to do within the game.

1.4 Playing robot soccer with a remote control

In the past, robot soccer was usually played with autonomous robots. The idea was to write a software which allows the robot to play this game. Programs written from different teams have to compete to each other. This sort of task combines Artificial Intelligence with a concrete problem.

There is a small modification available to this problem. Here the robots are no longer controlled by the onboard AI but they are remote controlled. That means, human players are standing outside the arena and pressing buttons which allows the robots to catch the ball. On the first look it is hard to recognize why such a game has something to do with Artificial Intelligence. The reason is, that the software for controlling a remote control robot is little.

But perhaps both ideas can be combined with each other? What is about a robot soccer game in which the robot are controlled by humans, but the referee is controlled by an AI? The referee is hidden in the camera and the task is to detect the ball position and determines which of the players has won. In such a setup AI problems and normal remote controlled robots are available both.1 2

To understand why this combination makes sense let us investigate the bottleneck of robot soccer in the past. Suppose the teams and the match is working with the strict rule that the robots are controlled by software. Then the situation is that the robots are activated, and the team is standing around and has nothing to do. They can observe if the written code works fine but they can't interrupt the game. So their role is passive by nature and it depends on the program if the robot is doing a useful task.

Somebody may argue, that this interaction mode is not a problem but it is the goal of a robot competition. No it is not the goal. The goal is to let robots play soccer and use Artificial Intelligence for this purpose. The open question is how to exactly such a match is realized. The reason why sometimes the competition was modified slightly in a sense that remote controlled robots are allowed is because it makes more fun. If the human player can press buttons they can influence the robots on the play field. The resulting behavior of the robots is more human like. That means, they are acting different.

The only question left open is where is the AI located if the robots are controlled remotely. In a normal case, no artificial Intelligence at all is needed. This makes it complicated to see such game as a serious challenge for computer science. Computer science is about neural network, autonomous robots and algorithm. But in a remote controlled use case such things are missing. In the past both attempts to play the game were seen as opposite: First idea is to build autonomous robots and program them and the second idea is to control the robots with a remote device and avoid a fixed algorithm.

1.5 A virtual robot referee

In the past, Artificial Intelligence was not defined precisely enough. Sometimes it was assumed, that a complex robot is equal to an artificial intelligence. For example if a robot is able to walk with two legs or kick a ball into the goal then the robot is intelligent. Another assumption was, the chess playing computers are intelligent, because they are able to win against any player.

The problem with this understanding of Artificial Intelligence is, that an important aspect of games is ignored. Every game contains of a referee which is judging about the player. It depends on the referee if it makes sense to kick a ball into the goal. If the referee recognizes that the goal was the wrong one, the robot can't be called intelligent anymore.

The interesting situation is, that even advanced robot challenges like Micromouse, line following problems and Robocup are working with no referees or only with human referees. That means, there is a human which has a stop watch and he determines if the robot has done the task. But what if the task of the referee is automate as well? Such an instances is called a virtual referee because it is a software which determines if a robot or a human player has made everything correct. It is a scoring mechanism which judges about the quality of a behavior.

The hypothesis is,that a player in a game is not an example for artificial intelligence but a referee who is realized in software is an Artificial Intelligence. The reason is, that the referee is the highest possible instance in a game. The only way to judge about the referee is by assuming that the game itself is boring. For example that the line following problem in general doesn't make sense.

The amount of literature and projects about virtual referees is low. In the past the problem was simply not recognized as important. Instead all the attention was put on robots who are playing and winning games but not about judging them. Can we train a robot to become a referee? Of course, but this role is different from a normal player. A referee represents the game. He gives scores and punishment to the other players.

1.6 Realizing artificial intelligence with evaluation functions

In the past Artificial Intelligence was mostly discussed from a theoretical perspective. It was asked what intelligence is and how machines can emulate such property. The next logical step is to create a practical demonstration for artificial intelligence and the best approach is a computer which is able to play and win a game. In such a case the topic of AI is discussed more realistic and very important it can be demonstrated which sort of games are too hard to the AI.

In some discussion it is assumed that game AI is state of the art in AI research but it is possible to go a step further and investigate who to create a game playing AI in general which is able to win all the existing games. The interesting situation is that it is possible to describe who an AI can win a game. What the AI has to do is to score the current game state with a numerical value and then use a search algorithm like A* to explore the game tree. This strategy works surprisingly well.

How the A* algorithm is working is well known the bottleneck in the description is the evaluation function. Every game can be evaluated and the interesting situation is that most games doesn't provide a score or the score is delayed. For example in the tetris game the player can make an action which is not recommended without affecting the score on the screen. The player will see only 3 moves later that the move was wrong. What an AI has to do is to create its own scoring function which determines precisely if a certain situation is an improvement or not. This scoring function can be combined with a search algorithm into a winning strategy.

From a more abstract perspective Artificial Intelligence is identical with finding a scoring function for a game. A strong player has a better evaluation function than a weak player. The weak player makes a move and doesn't notice that his situation is worse than before. At the same time this definition allows to identify which problem remains unsolved in AI research. It is an open problem how to find such an evaluation function. What is used today are handcrafted examples which were programmed by humans. What is hard coded in the evaluation function is the heuristic knowledge of an expert which results into a numerical score. This scoring mechanism is the core element of an AI player. The other parts like the search algorithm, the GUI and the move generator are programmed around the scoring function.

Sometimes the problem of creating an evaluation function is discussed under the term “grounding problem” grounding means to convert a world state into a numerical value which can be processed by a computer. The reason is that existing games are formulated on a higher level than computers can understand. The only language computers can process is based on numbers. If a problem was formulated in terms of integer values and floating numbers it can be solved easily on a desktop PC.

Let me give an example. Suppose there is a list of values which have a value form 0 to 100. The computer has to search in the list for the minimum value and print out the entry. Such a task is not an AI problem but it is a classical programming problem. Any programmer and of course any discussion forum in the world knows how to solve this problem fast and efficient in a major programming language. If a scoring function is there, the decision making in a game is reduced to such a minmax search problem. The only obstacle is, that most games doesn't provide such an easy to solve problem, so the game has to be converted first. This transformation is not understood by today's researchers and it is an open problem within Artificial Intelligence.

1.7 A cost aware AI pong player

The classical understanding of Artificial Intelligence is dominated by the bottom up paradigm. What Rodney Brooks has introduced in the late 1980s was never a paradigm shift but it was the default understanding what AI is about. Suppose there is a game of pong and the AI player has to act in this game. Then the player needs to figure out the next action and everything is described from a bottom up perspective. Bottom up means, that the problem is solved from the robot's perspective. For example, the AI player is on the right side of the pong table, there is a ball and then the player decides to move towards the ball.

A typical implementation of a bottom up pong player is working with some if then rules. For example, if the player is near the ball then move to the it. The problem with this approach is that it is not working well enough. The more elaborated attempt to play the game is think from a top down perspective. Top down ignores the action space of a single player and tries to judge about a game from the referee perspective. .A referee has to decide how gets a reward in a game, and a referee has to decide who has won even if the game is not over.

A referee doesn't act in the game but he score the players who are doing something. Top down judgment and creating a virtual referee is the same. This approach allows much better to implement an AI. The AI is not the player on the right side and also not the player on the left side, but the AI is the game engine who is judging from a birds eye perspective towards the game.

2 NP hard problems

2.1 What is the np hard problem in AI?

In the past the so called np-hard problem was sometimes mentioned in the literature. The basic idea is, that all the robotics problems are np hard which can be shown in a proof. It means, that the problems are so complicated that a normal desktop PC can't solve it. For example, the pick&place problem in the 3d space is np hard, driving a car is an np hard problem and controlling a biped robot as well. NP hard means, the state space is exponential in size and it is not possible to solve this space. That means, there is no algorithm available for np hard problems.

Let me give an example for np hard problems. Suppose a robot should grasp a ball and put it into a box. The initial situation is known and the goal situation is defined as a constraints. The movement in between are equal to the state space. For a duration of around 1 minute the robot has to control the arm so that the goal state is reached. In each second the robot can decide between different movements for the joints, and the overall amount of possible action sequence is larger than 1 million, larger 1 billion and larger than 1 trillion. This makes the grasping problem to an np hard problem.

The next step would be to investigate how many action sequences are available in the duration of 1 minute and determine in years how long a supercomputer would need to calculate them all. But it doesn't change the situation drastically. The more interesting situation is how to solve np hard problems?

Solving a problem is done in computer science with an algorithm. An algorithm is a program which can solve a problem. The only way in solving np hard problem is to reformulate them. That means, only non np hard problems can be solved by an algorithm in a small amount of time on a desktop PC. And if the grasping problem with the robot and the ball are np-hard something with the problem is wrong.

So the question is how to transform an NP Hard problem into a non np hard problem? This is done in game design. Game design is not about solving problems, but game design is about inventing game. The game has to be invented in a way, that it has a small state space. A typical sign for this precondition is, that a scoring function is available, that the problem has a hierarchy. For example a reformulation can be, that in step 1 the robot has to approach the ball. And the measurement if this action was successful is done by the distance between the robot arm and the ball.

Unfortunately it is very hard to define in general how to create games which are fitting to a certain domain. The easiest way in doing so is to use existing games which were invented by someone else and investigate if these games are fitting to a domain. For example the question is if the ball grasping problem can be modeled as a graph traversing problem. That means, the robot and the ball are in a graph and there a quest has to be solved. Another possible attempt to increase the abstraction is to describe problems as text adventure. Most text adventures have a surprisingly low state space. The reason is, that in a text adventure the amount of possible actions in each state is low.

What we can say in general that only if a problem was reformulated as a polynomial problem it can be solved by algorithms and computers. In contrast, if the game design wasn't done this way, it makes no sense to think about automated AI players.

3 Practical demonstration

3.1 How to program a cost aware line following robot

In the past, line following robots were programmed with a certain bias. The idea was, that there is a piece of robot hardware which has to be programmed, and then a certain algorithm or software gets implemented on the machine. Typical terms for describing the project have to do with the onboard CPU, the programming language and the sensor readings. So in general it was a robot focused paradigm.

This sort of self understanding is not very efficient because it ignores the environment of the robot. The better idea is to prefer a top down ideology in which the robot gets ignored and the programming is focused on the game itself. At first the robot is controlled manual. Either with a remote control device or by manual moving the robot forward. Then, a cost function is added. That means, during the teleoperation a cost is calculated. If the robot misses the white line on the ground, a penalty is given. This cost function is calculated with the onboard computer of the robot. So the robot doesn't act as an individual player in the game, but he monitors what he is doing. This includes cases in which the own performance is low.

In the third and last step, the cost function is connected to a solver which determines the next action for the robot. But this step is not very important and can be postponed or ignored at all.

The resulting project is much different from line following robots in the past. First thing is, that the robot isn't working autonomously but it is controlled by a human. And second, the robot doesn't follow the line but the robot moves mostly away from the line to check if the internal cost function is able to detect the situation.

3.2 How to model a production line

In a previous paper it was explained in depth that top down models are important in robotics. The model allows to ground a domain into a machine readable optimization problem. The only thing left open is how exactly a model has to be created so that it will fit to a domain.

A possible example is given in the case of a production line. The idea is that the production line is working with many conveyors at the same time. The reason is that every conveyor is automatically something which is increasing the productivity. A robot is used to sort the incoming boxes on these conveyors. The figure shows the model realized in the petri net notation. The robot can grasp 2 objects at the same time in a box. His task is to take an object from one of the left conveyors and place it to the correct colored conveyor on the right.

Production line

On the left side, the objects are incoming with a random order and the goal of the game is to sort them on the right side. On the first look this task looks not very complicated because the workspace is structured very well. Nevertheless there is a need to model the situation. Modeling means to draw the petrinet to the screen and make sure that the model can be simulated. That means it is a computer game in which the robot can take decision and see what happens if he puts the item to the wrong target conveyor.

In the model, such a situation is handled with a negative reward. That means there is a counter who recognizes if the robot is doing what he should. So the goal of the game is to maximize the own score. It is important to know that the reward function is made explicit, that means it is written in the game description. So the question is not what is going inside the brain of the robot or if he is intelligent enough to take the correct actions, but the question is how does the game works. That means, are conveyor is moving, does the system is recognizing if an item was place into a position?

The reason why the color sorting problem needs to be modeled in an abstract way is because it allows to solve it. Suppose the model was programmed, than it is possible to use this model for optimization purposes. That means, it can be determined what the optimal actions for the robot are. In such a case the reward function is known, the model was defined so it is a simple numerical optimization problem. The solver has to make sure that the robot is able to win the game. And this will produce a behavior pattern which makes sense in the game. That means the color sorting problem can be solved the same way like a chess engine is working.

It should be mentioned that the order of steps in modeling are important. The first step is always to create the model. Only if the model is available and if the model can be played by a human player then it is possible to automate the system with a robot. It is not possible to determine first the optimal action for the robot and then investigate what the problem is about. That means, bottom up approaches which are starting at the robot aren't working. The only sense making order is to solve such problems top down. That means at first the game is defined, then the game is simulated, and then it can be solved on the low level by a robot.

With such a strict definition of top down modeling the definition of Artificial Intelligence is very strict. AI doesn't mean that the robot is able to understand something, but roughly spoken there is no robot but only a game model. That means, it is not important what is inside the brain of the robot but the only thing which counts is which color has a certain item, what is the conveyor about and so on. That means the problem description is very important while the robot itself is only an add on in the system.

Let me describe the situation from a more abstract perspective. In bottom up robotics the focus is on the robot. The idea is that the robot is analyzing the environment and the robot is executing certain action. In top down model based robotics, the focus is on the game. The game contains of a map, shown in the figure, it is a logical structure and there is a goal. So the idea is to design either a board game or a video game and make sure that this game is an internal logic. The interesting situation is that the game's logic is not located in a human player nor in the robot. But the game's logic is defined top down. It is written in the game description and it includes many things like a metal conveyor, items on the conveyor, a movable cart, possible actions, a reward function and so on.

Creating reward functions

The screenshot with the production displays on the one hand a complex computer science problem. Because the optimal actions for the robot are needed so that the color sorting pick&place problem can be solved. In the concrete example the robot has loaded the red and the blue item. So he has to place first the blue item and then go upwards to place the red item. Then it can be investigated what to do next.

But even if the problems sounds very complicated a computer software is able to calculate the actions easily. The reason is that the problem's specification is made clear. That means it is not a challenge for a computer to figure out the action sequence but it is some sort of well defined situation. The only true challenge in robotics are ill defined problems. That means if no model is available but the robot should figure out by himself what the game is about. That means if no problem was given it is hard problem and if a problem was defined it is an easy challenge. This paradox situation has to do with the inner working of computer and how to program them. A computer is a great tool for adding two numbers or search in a graph for a value. Such a task can be realized in a programming language with a few lines of code. The only challenge for a computer is a situation which can't be solved with this principle. That means if the problem has nothing to do with adding two numbers then it is a hard problem.

Unfortunately, most robotics problems are located in this region. And this makes it hard to analyze these problems. A possible attempt to describe the gap between the capabilities of machines and the real world is the symbol grounding problem. It means that the robot doesn't understand natural language instructions. And this produces a semantic misconception. Another more detailed attempt to describe the symbol grounding problem is working with textual goal functions. For example the human operator is saying to the robot “pick up the red item from the conveyor”. The robot can execute the action only if a certain world understanding is there. That means the robot needs a model in which the terms red object, conveyor and pickup are defined.

Let us take a closer look into the example with the production line to understand how to solve the symbol grounding problem. In the example domain a simplified model of a production line was given. This model was programmed in a computer program. The model provides to the outside world basic commands like “moveto(), pickup()”. These commands can be parametrized, for example pickup(2) is translated to “pick up the red item”.


References

  1. Arenas, Matías, et al.: "A robot referee for robot soccer", Robot Soccer World Cup. Springer, Berlin, Heidelberg, 2008.↩︎
  2. Zhu, Danny, and Manuela Veloso. "Event-based automated refereeing for robot soccer", Autonomous Robots 41.7 (2017): 1463-1485.↩︎

January 27, 2022

Teaching stackmachines with a slow Forth language simulator

Forth is not known in mainstream computing, because the reverse polish notation is difficult to learn. To overcome the problem, a Forth language simulator is described which makes it easier for newbies to play around with a multistack-machine. The idea is to reduce the performance down to 1 instruction per second so that the user can observe in a singlestep mode what his program is doing.(video sourcecode) The question which remains open is how to program in Forth itself. The development of stack based algorithm is outside the scope of this paper, here is only a simulator given, which is able to execute existing Forth code.

The following longer article was published first on a different website as a PDF document. Here is the HTML version of the same text.

Title: Teaching stackmachines with a slow Forth language simulator

Author: Manuel Rodriguez

Year: March 30, 2018

Table of contents

1 History

1.1 Short history of stackmachines

Charles L. Hamblin published in 1957 a paper about stackmachines which was later realized in the KDF.9 computer. It was the first stackmachine who was ever build. In the late 1960s Chuck Moore invented the Forth programming language. Forth was first used as a programming language for existing computers and in the 1980s dedicated Forth chips in hardware were build, for example the RTX2010. The 1990s and later was dominated by private research. :The first Forth user groups were founded and the current development is, that amateurs are using FPGA hardware for implementing a stackmachine.

From a non-Forth perspective the question is, what is so special about a stackmachine? Can this technology be used to make computers more efficient? Stackmachines are usually engineered not from a user perspective but from a hardware point of view. The architecture allows to minimize the transistor count and make the design very clean. The negative aspect of Forth is, that it is not possible to run normal highlevel languages like Pascal, C or C++ on this. So stackmachines are useless for mainstream computing.

If we are observing the Forth literature, many attempts were undertaken in the past to port C compilers to stackmachines. In most cases not very successful. The problem is described in the literature as compiler-friendly CPU. Or better, only a CISC CPU is compiler friendly. That means, mainstream computing hardware is designed for the needs of the programmers and endusers, not for the needs of the machine itself. Another disadvantage of Forth CPUs is, that even somebody is able to convert existing c application to Forth, the algorithm is not very efficient. The reason is, that algorithm like sorting has to be implemented for a stackmachine differently. Usually algorithm are in contrast written for CISC CPUs, they are using variables and not a stack.

Early compilers

Let us go back in the year 1960 in which stackmachines were first researched by pioneers. The idea was, that on one hand there is a lowlevel machine language and on the other hand the need for programming in a high-level language. Today we call the gap between them a compiler, in the 1960s the process was called “automatic programming”. The idea is not to program in Assembly language but in a high-level-language.

Everybody who is familiar with the reverse polish notation on the HP calculator knows, that it is difficult to use. So the reverse polish notation was replaced by something which works easier. Todays calculator are user friendly, that means, the user is not forced to use the RPN notation. In the computer language this innovation was called “Fortran”, for formula translator. It improves the programming of machines. The programmer were not longer forced to type in assembly codes, but can type in their program in normal notation. The result of the development of the first compilers was, that the idea of stackmachines were no longer needed. A Forth like language is similar to Assembly, and with the first compilers there is no need to program in Forth. The result was, that Stackmacines were a niche topic since that time and instead CISC based cpu dominated the business.

That was no mistake, because the reverse polish notation is indeed hard to master and is not helpful if someone is programming huge number of codelines. I would suggest that non-RPN calculators and non-stackmachines fits better to modern needs in the mathematics and programming. But so called register machines haven't replace stackmachines completely. Today, we have both: the obscure RPN-notation and the modern c compiler plus CISC cpu. I do not think that there is need to decide between them, both domains have their advantages.

The literature from that time is not online-available. Only the bibliographic references are stored in the worldcat catalogue. The fulltext paper of early stackmachines are not scanned in. But that is no big problem, because later authors are citing from the papers and a modern paper written in 1980s and later gives the same amount of information like the original papers in the 1960s.

2 Misc

2.1 What is Forth?

It is easy to define Forth. It is a virtual machine like the JVM, which is programmed in assembly. The VM has the structure of a dualstack pushdown-automata and provides commands like “drop, do .” To run a Forth program on a computer a concrete realization of the Forth VM is needed. Most Forth implementations are written in assembly language. But it is also possible to use C and compile this to assembly to get a Forth system.

At least one word to the performance of Forth. A normal Forth likes Jonesforth is not very fast. If the user want's to run a Forth program, the VM maps every Forth word to a machine language command. It is comparable to a BASIC interpreter. The faster approach would be to use a Forth compiler, such compiler are existing, but they are only commercial tools like VFX Forth.

The average Forth VM is slower than what a GCC compiler can provide. The question is: why are the people fascinated from Forth if the language is difficult to program and slower in the execution? I believe, that 99% are using Forth as an education tool. They are not interested in writing real applications with it, instead they are using Forth like the PL/0 language invented by Nikolas Wirth to understand assembly language and compiler development. For the educational purpose Forth is well suited. The reason is, that the user is forced to become familiar with his computer. For implementing a Forth VM from scratch he must understand normal assembly language for the 6502 CPU, the PIC microcontroller or the AMD64 architecture.

That means, that a Forth VM can be compared with a JVM or the GCC compiler. For programming such software from scratch, the user must understand the computer on hardware level. The question is not, what feature Forth has, the question is what type of knowledge the user must bring in to call himself an Forth guru. For example, the famous Chuck Moore can be called an Forth guru not only he knows who to write a Hello world program in Forth, but he is familiar with assembly language on mainstream computers and even has designed his own CPU from scratch, so he is also an expert in chip-manufacturing.

The funny thing about Forth is, that it stands not in contrast with mainstream computing. I would call Forth not a programming language, but a didactic concept. Let us first define what the problem is. The problem is, that there is a computing class with students in the first semester. And they have absolutely no idea what a compiler is, how to design a CPU from scratch or how existing CPUs like the AMD64 system works. How do we get in a short period of time the students to an expert level? They need a hands-on project in which they learn in a short amount of time a maximum of knowledge. And here comes Forth into the game. It is a combination of a teaching-programming language, a teaching operating-system and a teaching CPU. A possible alternative to Forth is, to use existing “educational only” projects, for example we can combine the Z80 CPU (which is used on most universities in computer courses) with the Minix operating system with the above mentioned PL/0 programming language.

Sometimes, Forth is called an easy to learn language. That is wrong. Because the problem is not to learn Forth itself, the problem is everything what is outside of Forth. That means, understanding what a stackmachine is, on a pen&paper computer is easy, but implementing this on a real cpu is hard. Let us imagine an example project in a computer class: “Implement a forth dual-stackmachine on the 6502 CPU with assembly commands according to the jonesforth specification”. It is a clear task-description and it is possible to do so. But it is not easy going, it is an advanced topic of computer science which has to do with many aspects.

The misconception of Forth is, that the language is comparable to C which means, that after some experience the beginner is able to write programs with it. No, he is not. It is not possible to learn Forth with the aim to use it in reality. Instead Forth is some kind of never-ending-curriculum. The idea is not that the learning progress ends someday, the idea is to go deeper into the subject. If the student is able to write a “Hello world” program in Forth, the next task is to write his own Forth VM. And if he has done so, the next task is to make his own Forth CPU.

Let us investigate how Forth is used in reality. Most companies who are developing Microcontrollers are using Forth in daily life. They have programmed their own Forth IDEs or they buy the license of VFX forth. Why are they doing so is simple. Because companies have employees and employees must be teached. The first thing what the new employee in a company has to do is to attend a course about programming and hardware design. And the lingua franca in this courses is what? No, it is not C, because C can be used without understanding the language. Most c programmers now nothing about compiler programming itself. They know the language itself, and thats it. But, a microcontroller company needs employees who are familiar with lowlevel programming, and knowledge in C is not enough.

The problem for what Forth is used, has nothing to do with programming or CPU design itself. It is more a social problem. The question is how to train lots of employees in a short amount of time to a deep understanding of technology. Forth is the answer. Let us take an example. Intel has produced a new chip generation with around 1000 opcodes. The chip is working great. So Intel has no problems, right? No, Intel has a lots of problems, because somebody in the company must understand how the system works. The knowledge about Intel chips is not public available, because it is a commercial product. Only the employees of Intel are aware of it. And the employees are fluctuating. The question is not, how to program the intel cpu, the question is how to teach new employees of Intel in a short amount of time.

Forth itself has nothing to do with Intel cpus, it is a fictional virtual machine invented for no concrete purpose. But Forth is the right choice if around 1000 new employees have to be trained to become experts for Intel assembly language. Implementing a Forth VM on an Intel CPU is the first project, what the students get. They learn all the commands of the hardware. They reverse engineering a system which is already there. And after the project is over, they have acquired the knowledge.

I do not believe, that we see in future any program which was written in Forth, or any CPU which was designed as a stackmachine. What we see is, that Forth is used to increase the knowledge of people. And this people will write normal C compilers and create normal CISC CPUs.

Forth as universal assembler

Sometimes, Forth is described as an universal assembly language, which runs on all computers. A program, written in Forth for the C-64 CPU runs without modification on an x86 CPUs and runs also on a Microcontroller. But this recognition is wrong. Instead, the real portable Assembler is the C language. C is a programming language used in real projects, for example to write the UNIX operating system which is portable between all computers. Instead, Forth can be seen as an educational language. It is not a portable assembler, but a subject in a computer-science class. That means, Forth fits very well in existing university like structures which are teaching knowledge and evaluate if the students have understand it.

2.2 Forth is the future

According to the TIOBE Index, nobody is interested in Forth. Nevertheless is Forth the language of the future. How can fit this together? As an introduction I want to cite the famous Computer Chronicles episode of 1984 https://www.youtube.com/watch?v=D5osk9lrGNg which presented Forth in television. Since this time, more than 30 years are gone, and the episode looks very fresh. I would suggest, that from now in 30 years in the year 2048 the Forth programming language will also be fresh and new. It is a timeless programming language and the need for teaching Forth has something to do with people. That means, in every time there are people who want to know what a stackmachine is, and how a minimal operating system looks like. Forth is timeless like the MMIX CPU. It was constructed for eternity.

Let us imagine how a potential future of Forth will look like. What will not happen is, that Forth can replace C or C++. The reason is, that Forth is difficult to program and slow in execution. Especially, if the Forth Virtual machine was programmed by beginners it is very slow on x86 hardware. But Forth has a famous standing in computer education. That means in a setup in which a female or male professor is explaining to the students what Forth is, and let them work on Forth related project. Forth is some kind of artificial language which is not used in reality, but in computer courses at the university. And i would suggest, that Forth is for that task superior to other artificial languages like Java, PL/0 or Haskell. If the students should learn a language which is not spoken anywhere else, apart from their computercourse, Forth is the perfect choice. It is minimalistic and it is possible to explain with that language every aspect of computer science like compilers, cpu-design, transputer, robotics and algorithm.

The assumption is, that these subjects are relevant in the future, and there is also a need for teaching the domains to students. So Forth will have a wonderful outlook. I would suggest, that good computer courses in the next 100 years all using Forth. On the other hand, Forth is dead for practical applications. It is not possible to use Forth as a replacement for Intel x86 CPUs or as a replacement for C++. Perhaps we can describe the problem on the Minix vs. Linux debate. Minix is a teaching operating system. It is minimalistic and can be used in computer courses. While Linux is a commercial grade operating system which can be used on real hardware in mission critical systems. Writing an academic paper about Linux is a bit difficult, because it is changing all the time. Instead a paper about Minix makes sense, because it is so static and timeless.

2.3 What is a good Forth implementation?

Forth implementation which runs on real hardware are many out there. I personally prefer Gforth, because it is in the Fedora package manager available but any other Forth system may also be working. The main problem with today's Forth implementations is, that they are programmed for real usage. The idea is to start gforth, type in some Words and get a turnkey application. This workflow doesn't fit to the need of the audience. If somebody wants to implement an algorithm or playing around with a microcontroller a normal C-compiler gcc is much better. That means, the code runs faster, and the code is easier to program.

The needed feature set for a Forth system is different from what the user wants to do with C. A forth system must support the learning progress of his user. It should be constructed as a serious game with the aim to provide knowledge. A Forth system is equal to a Forth “computer based training”. In the optimal case it consists of a pdf book plus an interactive simulator for trying out the knowledge. The simulator has not the obligation to run Forth code exactly or with maximum speed, but with visual feedback. That means, it should be a Forth simulator, like an Assembly simulator. The most sophisticated computer based learning software is called “Softsim” and is part of the GA144 cpu. But it goes not far enough. My own Forth system has more similarities to a game.

Figure 1: Forth simulator

The idea is to program the software not against a certain type of cpu, or use some high-end-compiler optimization. Instead the Forth engine is equal to a game engine. It is primarily a visual application which prints out the graphical stack and the user can move the instruction pointer with keys. The simulator is not ready, it is only a prototype. The idea is, that the user can run the game in single step mode or in a automatic mode, which cycles through the entire Forth program. The Forth system does not exist under a operating system or on a cpu, it is written in C++ and can be compared with a minecraft CPU. That means, the op-codes are doing nothing, they are interpreted from the game, but not by a real cpu or by the Linux operating system.

Let us investigate who usually a Forth system is intended for. Most existing Forth systems has the idea, that the user is programming a real machine. For example, he runs the Forth under his x86 PC, on a microcontroller or on FPGA. The idea is perhaps, that the user is a programmer and wants to try out Forth in real life for making something useful with it, for example let a LED blinking or write an algorithm. I think, the user has no interested in using Forth for programming or trying out new hardware. I think, the user want's to understand Forth on a theoretical basis without programming in the language itself.

The reason why the user think so, has to do with complexity. Forth in real life is very complicated. To program a working a Forth system or write sourcecode in Forth the user must have a deep knowledge of operating systems and hardware. And if an error is happening, in all cases the user has a lack of knowledge. For example, he is testing out the new eForth system on a microcontroller because he want's to let the LED blink. The problem is, that something goes wrong and the reason can be anything. Even the simple task of let the LED blink is too demanding for a new student. What he want's is a clean environment which is observable. In the above screenshot of my SFML based Forth,, no hidden cpu, microcontroller or software is there. The system consists only of the GUI what the user sees on the screen. That means, the Forth has really only two stacks and no virtual stacks or something else.

From a technical perspective this GUI software may look like not sophisticated. Because it can nearly nothing and is not comparable to a full blown gforth or the eForth system. But it has a different purpose. The aim is to be an educational tool only.

The question remains, what the user can do with the Forth simulator. Answer: he can use it for educate other people. Forth is not a real programming language like C. If someone needs an operating system or a programming language he is better suited with Linux. The intention of Forth has more to do with higher education and theoretical computer science. It is a subject of academic papers, but not of working systems.

I want to tell also the problems of the Forth simulator. My program is not able to read a VHDL description. So it is not possible to implement a Forth CPU on the hardwarelevel and simulate it. Instead, the execution engine is a fixed block, realized with if-then-statements in C++. Here is enough room for improvements. The problem is, that I do not how, how logicgates can be wired together to implement a complete CPU. This is usually done in the VHDL file, but it is hard to understand.

The idea of using a simulator is not completely new. There are many “Assembly Language simulators” out there, which looks very similar to my Forth simulator. But they are simulating a register machine, not a stackmachine. The “Marie Simulator” for example, is a Java program which simulates a CPU. It is an addon for a book about computer programming.[null2003guide] Another project is the “Little man computer”, which is also a GUI interface for teaching machine language.[yehezkel2001three]

A machine simulator, dedicated to a stackmachine, is described in [khan1997design] The idea is similar to other machine simulators. The program has not the intention to support engineers in debugging real programs, but to teach students with computer-based-training.

2.4 Making something useful with a Forth machine

My own Forth machine simulator runs quite well. The datastack and the returnstack is filled with numbers, and it is visible for the user, how the game works. But one question remains open – for what purpose? Let us first examine, what the Forth machine can not do. It is not possible to run any other language than Forth on the machine. For example C. In theory it is perhaps possible to write a C to Forth compiler, but the system would not working well. Because a c-compiler works better on a register machine. The better idea would be, to program a Forth machine in the standard Forth dialect. And this results into the above question, why should somebody do so?

Currently, I have no answer to it. As far as I can see from the sourcecode, programming in Forth takes longer than programming in C. The reason is, that the programmer has to use the stack, if he want's to do some useful operating. And writing a program which is using the stack is more complicated than using normal variables or objects from a C++ program.

In theory, Forth is a great language for formulate algorithm in a machine independent format, but most algorithm in computer science can be formulated in normal Python easier. That means, there is no advantage for writing a pathplanner in Forth or implement a longer program in it.

There is another potential reason. Writing a Forth program makes sense, if the aim is to improve the simulator. For testing out, if the simulator works correctly, we need some longer sourcecode. Another reason is, that programs written in Forth are smaller, than in other programming language. I would suggest, that learning to program in Forth helps the students for program in other languages like C++.

What else can we do with Forth? Running a Forth program in a simulator is a very slow task. In my own simulator, not more than one instruction per second is executed, and it is not possible to increase the speed very much. In contrast, todays homecomputer are very fast devices with a highspeed 2 ghz clock. What programmers can learn from Forth is to write small programs, which need not much cpu power. The idea is, to reverse the “Moore's law” into a race to the bottom. The question is, how slow can a cpu be made for running all the important programs on it. The idea is to use computers in the size of the Commodore 64 with the same speed and the same low amount of ram for doing useful tasks with it. With normal c programming techniques it is not possible.

Let us stay on the Commodore 64. There was some time ago a project called Contiki, which is an operating system for the C-64 which includes a TCP/IP stack. A nice piece of work. But, programming in 6502 assembly is very complicated, and the sourcecode runs only on cpus with the same instruction set. The better idea would be to program a Contiki like operating system in Forth, and write a Forth machine for any hardware out there. Or let me explain it from the other way around. The 6502 cpu is no longer in production. And running Contiki on the x86 CPU is not possible, because the instruction set is different. Assembly language is the wrong decision, if someone wants to program a small operating system. Forth instead, is also a lowlevel language but can be executed on any machine.

2.5 Is Forth an elitist language?

Bernd Paysan wrote that Forth is a language for the hacker elite:

“So IMHO Forth is an elitary language, designed and written for the best, or gifted, not for the masses.”https://bernd-paysan.de/why-forth.html

And he is right. Tthe number of people who are using Forth is small. Stackoverflow for example has under the tag “[Forth]” not more than 189 questions online.https://stackoverflow.com/questions/tagged/forth

While C++ for example has around 500k questions. Or to explain it the other way around, if a newbie is posting one single question about Forth, for example “Hi, I've installed gforth but my hello world app doesn't work” he has increased the number of worldwide question about the subject with 0.5%!

But why is the Forth community so small? It has to do with the costs. It is difficult for the people to get information about Forth or buy Forth related hardware. Forth can be seen as an obscure programming language which is not widely known. In contrast, the educational programming language “little man computer" is teached on many universities worldwide. Many simulators and user groups are out there which are researching in detail how to program this simple computer.

I do not think, that Forth itself is the problem. Sure, programming in Forth is not easy, but the language is not more complicate than the assembly syntax of LC-3 (another educational language). I think the bottleneck is somewhere else. The number of introductory books is to low and books which are available for example [koopman1989stack] are not outdated. The book was published 27 years ago, it is a great book but can't be recommended anymore.

2.6 Forth ressources

The major problem in the Forth community is to find information. At first, the number of papers is small, and many of them are behind a paywall. Second, the information are often not logical, that means, author 1 contradicts author 2. So the overall knowledge about FORTH is decentralized, and not easy accessible. Overcoming the problem is simple: emulate Forth in a sandbox.

I mean, Forth itself is very complicated. If the user is trying to program a Forth virtual machine for a certain type of processor for example the x86 CPU he increases the chaos. The better way is not to focus on Forth itself, but to program a game, in which a Forth computer is simulated. The idea is to see Forth as an educational programming language, like the Little man computer. The consequence is, that the problems are different. Because there is no longer a Forth system which runs somewhere, but instead there is only the Forth simulator which is written in C++ and the aim is to improve the simulator.

Inside the simulator it is possible to run a forth program, for example “1 2 +”, but this is not the important part. The idea is not to program in Forth, but to program the environment for Forth. The advantage is, that from day 1 the overall system is working, even if the Forth interpreter is not able to do sophisticated things. For example, my own Forth simulator is very slow (1 instruction per second), and can only execute 3 commands. That means, the system is not able to interpret a standard Forth program. But that it is not a problem, because the simulator itself runs stable. That means, the C++ code compiles, something is shown on the gui, and the user can move the instruction pointer.

The nice feature is, that the overall problem is under control, because the C++ language and the aim of writing with it a game is well researched. That means, the user has no longer questions about Forth direct, but he is trying to improve a normal C++ program. That means, it is ok, if he doesn't understand Forth, because he is not interested in programming the language. Instead, Forth is implemented like a Assembly game. It is something which runs inside another program.

The first critics was, that it is a waste of time for writing a graphical simulator, if a normal Forth interpreter runs much better. And indeed, current Forth systems like Jonesforth, Javascript Forth and Gforth are very fast and can execute all Forth statements. The problem with all these programs is, that they are forcing the user in a certain situation. If someone starts the Jonesforth interpreter, all what he can do is enter Forth code. And because most users doesn't know how to program in Forth, they don't like it. From a teaching perspective, it is better to force Forth into a certain position inside a clean environment. That means, Forth is only a textfile which is executed in a much broader system. And the broader system (the simulator) is written not in Forth but in C++ and has a nice GUI.

According to the given literature this concept is very new. In the Forth community itself, no other project was done into that direction. The idea of using simulation games for teaching a language is only used in the mainstream computing, for example the Little man computer project. Let us investigate this project in detail. The assembly community knows two types of users. Hardcare programmers, which are using NASM like assembler for programming real programs. For example, they are writing a new intro, and they are doing so since many years. On the other hand we have Assembler simulations. That are programs which are compared to the NASM in a weaker position. The compiled sourcecode runs slow, and often it is not possible to make any useful things with it. So called “Assembly language simulators” are used by newbies to learn the language. The common feature are, that the system is per default in the singlestep mode, has a graphical interface and additional lots of documentation which explains what Assembly is.

The Forth community has currently only one type of user: the hardcore programmer which is using Forth since 20 years, has programmed his own FPGA and is fluent in programming Forth. That makes it very hard for newbies to entering the community, because the tools which are there use are not very beginner friendly. Even the lowlevel entry Forth “Jonesforth” is a tool for experts. It has no graphical interface, runs Forth code very fast and is used for creating Forth apps from scratch. That means, Jonesforth is not focusing on beginners.

Let us take a look into figure “Forth simulator” which shows the GUI screen. It is not a console app, but visualize the current Forth system. It has a datastack, a returnstack, RAM and an execution unit. Describing the system is simple: it was created with the SFML library in C++, that means it is a game comparable to pacman. The user can send some keystrokes to the simulator and the game answer with a certain behavior. The rules of the game are called Forth, that means it works like a stackmachine with two stacks. From a technical point of view, the system is not so powerful like Gforth or Jonesforth. It is comparable to an Excel chart and is not a real Forth system. The advantage is, that the user is supported in his learning experience. Even without understanding Forth the user can press some buttons and see how the system works.

Figure 2: Forth simulator

Currently, the simulator has only a few actions. The user can press up, down, left and right. This moves the instruction pointer and executes the command. The simulator works in a way, that it shows what is inside the stack. That means, the debugging mode is activated as default. According to the specs, the Gforth system has also a singlestep debugger. But it is not very pleasant. The same is true for the SwiftForth. Both debugger are created with low priority, in the advance that the user knows the language already from the textbook. But this is not true. The user needs a app because he want's to learn Forth.

2.7 Writing software in Forth

According to most papers about Forth, the programming is done with a divide and conquer technique in which a problem is described with subwords. Most newbies are not understanding the idea and ask for an easier way of implementing Forth code. My impression is, that the language which is nearest to Forth is the assembly language. Coding for old 8bit Atari computers and Forth has much in common. Sure, Forth is something which goes way beyond assembly, because it can run on any machine and it is possible to design special cpu only for Forth, but the programming itself is the same.

Let us investigate a small example how a program is created from scratch. Suppose the user wants to program a bubblesort sorting algorithm. The first step what he is doing is to ask google if someone else has implemented it. Maybe he finds a out-of-the-box solution on http://www.rosettacode.org or in one of the Forth forums. Because the community is very small in most cases he will not find any code. That means he must write the Forth code from scratch. The easiest way for doing so is not to separate the problems into words, but asking the assembly language community for help. The reason is, that programming on a lowlevel for a certain cpu type is done both: in assembly and in Forth.

Let us explain who a typical assembly program works. Usually it has no variables but reserve space in memory. Usually, the programmer is using subroutines, and usually the programming is done different from programming in C. I would suggest, that programming in Assembly for the Atari 8bit cpu takes around the same time than programming for the abstract Forth virtual machine. That means, the user must build up the complete program from scratch. Either Forth nor assembly has any predefined libraries or highlevel constructs like objects. Instead it is up to the programmer to program bare metal.

The good news is, that the assembly community has researched over many years how to program large amounts of sourcecode. Usually the learning curve is high, and it takes many weeks to program a game in assembly. And if an error occurs the debugging is time consuming. The same is true for Forth programming. It is like assembly, only that the cpu has fewer registers.

disadvantages

Let us describe the problems in Assembly and Forth together. The main disadvantage is, that it is a lowlevel language. That means the cpu is executing only the given commands and in a certain order. In contrast, C++ sourcecode is structured more like a textfile. That means the exact order of the statements is not very important instead the programmers describes general structures. That means, larger piece of C++ code can be written without feedback from the compiler. Instead Assembly and Forth programming is programmed against the bare metal. That means that every punctuation is important, because otherwise the system won't work. And i would suggest that most of the time the programmer is working in the debug mode, that means he is using a tracer to follow command after command for verifying that the assembler is doing the right thing.

Let us describe a typical bug. A subroutine is not doing what was expected. But where is the error? The assembler programmer has to go step by step through the commands and observe the content of the registers. For example he is recognizing that the accumulator has the wrong value, he traces the problem back to the command in the listing and modifies the sourcecode. The same error repairing mechanism is used by Forth programmers. The only difference is, that they are not documenting what they are doing. But in reality, Forth programmers are only special kinds of assembly programmers.

Now a typical example how equal Assembly and Forth are. In a stackoverflow thread somebody asks for help because he wants to initialize a struct in assembly language.https://stackoverflow.com/questions/23547328/following-struct-pointers

He describes the problem very well and has also a screenshot of his assembler. For c programmers the question seems very complicated, but it is the normal how assembly programming is done. The interesting point is, that the question is valid for Forth too, that means in a Forth programs the problems for initialize memory for a struct are equal.

I mean, either Forth nor assembly have any kind of datastructure, predefined way for accessing memory or standard libraries. What they are doing is working bare metal. That is the reason why programming in assembly takes so long, while programming in C is faster. The difference is, that assembly programmers work on hardware which can be used by a c compiler as an alternative, while Forth programmers have not the option to fallback to C, because on Forth chips modern C compiler aren't working anymore.

Here another problem. .The task is to reverse a string. According to Roessate code the LC-3 assembly language can used for the task.http://www.rosettacode.org/wiki/Reverse_a_string#LC3_Assembly

The listing contains words like SCAN, COPY, COPIED which have subactions for manipulating the registers. The Forth solution which is also given on the website is using also words (pushstr, popstr, reverse). It is nearly the same solution, except that Forth is bit more academic than the primitive LC-3 command set. In both cases, the sourcecode is not very easy to read. It takes for a beginner at least 10 minute to understand it. In contrast the reverse string task can be solved with normal C++ way more easier.

As a conclusion it makes no sense to compare Forth with C/C++. The languages are to different. But it makes sense to compare Forth with X86 assembly and LC-3 assembly. And I would go a step further and suggest that Assembly experts are also experts for Forth.

Let us answering the question how a complete operating system in Forth has to be programmed. Taking existing Linux sourcecode which is written in C and convert this to Forth makes no sense. The languages are very different, it is another culture. But, it makes sense for orientation on existing assembly OS, for example menuet OS. The sourcecode there can easily be transferred to Forth. It is the same programming style, and the same bare-metal mentality. This can be seen very clear, if we are comparing the debugging screen in an Assembly environment with the debugging screen of Forth. In both cases the programmer is going in singlestep through his application and traces back the register / stack values for identifying a mistake.

The example MenuetOS consists of 1.5MB sourcecode. This shows, that it is possible to write larger projects on bare metal. I would suggest, that also a Forth operating system in the same size is possible.

2.8 Forth emulator

Before I want to describe the situation for Forth, a short look into the 8-bit retroscene which is using Assembly as their main language. What happens since the Commodore 64? A lot, If we are searching github for the term 6502 emulator, we will see, that at least 100 projects are out there. Some emulators are written in C, some in C++, Java, Javascript and C#. The motivation behind such projects is the idea to learn something. Often the programmer have read the 6502 manual and implements the function in a high-level programming language like C++. That means, they are building a computergame which has registers and an Accumulator. The Line of Code is remarkable low what a 6502 emulator needs. Some projects have not more than 2000 LoC and they will run assembly code, which was written for the C-64.

The other way around is not very often done. I mean, to write in 6502 assembly a C++ compiler. In theory, this is also possible, but I found no project in that direction. The reason is perhaps, that writing in 6502 assembly language is more complicated then writing in C++.

Now we will come to the situation in Forth. My own project was equal to the above cited 6502 emulators. The idea was to use C++ and simulate Forth. This task was surprisingly easy. The reason is, that C++ is a very powerful language which can simulate everything: Forth, 6502 assembly, Finite state machines, neural networks or whatever. The other way around was not successful. That means, mean knowledge about Forth was zero before the project, and is zero after the project. That means, it is very easy to write a game in which Forth can be executed, but it is hard to write Forth itself. This is the same situation like in the 6502 emulator case.

Let us increase the difficult a bit. How complicated is the task to implement the ANS Forth standard with a C++ program? Not very sophisticated. The specification is available online and writing a C++ program which takes Forth code and give the result back is easy to create. It is a one man project. If somebody has experience in compiler writing he will mastering it in under a year. The result should be a C++ program which is 3650 lines of code and emulates a Forth system.

But how demanding is the other direction? I mean, to use Forth for writing a C++ compiler, or to use Forth for writing a game? It is way more complicated. The reason is, because Forth is complicated like Assembly, and not very good documented. I would guess, that even a computer expert will not have a prototype after a year of programming and it is unclear if his Forth program will ever run.

Let us increase the abstraction level a bit. .The first fact is, that writing cpu-emulators and compilers for languages is surprisingly easy. The number of github projects from that area is high and the number of needed lines of code is small. Easy means, that such projects can be done in a short period of time as a one-man-project and will result in a success. The second remarkable fact is, that the task is usually done with modern programming languages like C#, C++ or sometimes C.

Another interesting fact is, that after writing such an emulator, in most cases the programmer has not learned the underlying system for example the MOS 6502 and can it program fluently, but what they have learned is the tool-language C++ which was used to write the emulator. That was also true for my own project. The knowledge about C++ has increased a little bit, while the knowledge about Forth remains on zero.

Let us take a look at a typical “6502 emulator project”. According to the list http://www.6502.org/tools/emu/ one MOS6502 emulator https://github.com/gianlucag/mos6502 was written in C++ and has passed the “Klaus Dormann's test suite”. I didn't compile the emulator by myself, but i trust the survey. Let's take a deeper look into the github repository. It contains of 2 files. The .cpp file has 1315 Lines of code and the header file has 189 lines of code. So we see a C++ project with 1500 lines of code. According to the average productivity, the programmer has spend 150 days on the project (10 Loc/day), so it can be called a small nice amateur project.

I'm explaining this in detail, because in the theoretical literature the idea of programming a complete CPU from scratch is described as very complex, and the beginner thinks perhaps, that this will takes years of development and will end with a failure. But in reality, emulator writing is something which can be done as a hobby, and it seems that the programmer had a lot of fun with it. The question which remains open is, why did he used C++ for writing an emulator, why did he not used a Commodore 64 assembler for writing a program in Assembly?

With a bit searching, I've found some Emulator projects which were done in Assembly. That means, the programmer is using x86 assembly for writing a 6502 emulator. But, the number of projects is small, and the sourcecode is difficult to read. For example this one https://github.com/scantysnax/bender_x64/blob/master/bender.asm has 2800 lines of code in x86 assembly.

Why?

The reason why modern 6502 emulators are written in C++ and not in 6502 assembly itself has to do with the term legacy. The Commodore 64 assembly language is recognized as assembly. That means, the people are not longer using it for real projects. That means not, that the C64 or the 6502 is obsolete. The opposite is true: many retroevents are taking place and it seems, that today more people have an C-64 since in the 1980s as the machine was sold in reality. The problem is, that the programmers today are only talking about the machine and are not longer part of the community. That means, there are no real C-64 programmers out there. What they are doing is to place the CPU in a basket, simulate it with perfect accuracy and writing books about it. But what they are using in reality is a Intel CPU and the C++ programming language.

Can we do the other way around? I mean, using old Commodore hardware and 6502 assembly language to emulate modern computers? No, it is not possible. Because fixing C++ sourcecode is easier than fixing Assembly sourcecode. And a PC is faster than the 6502 chipset. I wouldn't call the C-64 death or obsolete, instead it is a legacy system. That means it is a closed system which is used for learning and teaching. And I would suggest that the Commodore 64 is the better computer for community building, than an ordinary PC. I do not know any PC or C++ programming club for example. Apple fan clubs are a lot of them, but I never heard of an Intel fanclub.

2.9 List of Forth CPUs, IDEs and books

year name author note
1962 Burroughs 5000
1962 English Electric KDF.9 Charles L. Hamblin
1985 Novix NC4000 Charles H. Moore Novix Inc
1988 RTX2010 Charles H. Moore 16 bit https://en.wikipedia.org/wiki/RTX2010
1989 RTX32p Philip Koopman 32bit, 2-chip
1989 RTX4000 Philip Koopman planned commercial version of RTX32p
1993 F21 Charles H. Moore 500 MIPS , 15000 transistors
1995 MuP21 Chen-Hanson Ting and Charles H. Moore 2 chip system (second VGA)
1996 i21 Charles H. Moore 500 MIPS, NTSC video output
1998 MSL16 Philip H W Leong similar to MuP21
1999 C18 Charles H. Moore asynchronous, SEAforth 40C18
2003 b16 Bernd Paysan
2003 JOP Martin Schoeberl Java optimized processor
2004 MicroCore Klaus Schleisiek
2004 FC16 Forthcore Richard E. Haskell
2006 SEAforth-24 Charles H. Moore 24 core cpu
2007 eP32 Chen-Hanson Ting together with eForth
2009 GreenArrays F18 Charles H. Moore
2010 J1 James Bowman Willow Garage
2010 GA144 Charles H. Moore commercial product
2012 N.I.G.E. Machine Andrew Read softcore cpu
2015 Lund CPU Girish Aramanekoppa Subbarao

Table 1: List of Forth CPUs

google “site:http://www.forth.org/svfig/ cpu”

http://www.ultratechnology.com/chips.htm

Name price
VFX Forth 1200 US$
SwiftForth 399 US$
8th 130 US$ per year
Gforth GPL
F-PC Forth system free

Table 2: List of Forth IDEs

List of FORTH books

3 Performance

3.1 Power consumption

  • 5 femtojoule per transistor [beiu2005sub]
  • Apple A8X, 3000 million transistors, 4.5 Watt ->1,5*10^-9 Watt/transistor
  • MuP21 Forth cpu, 7000 CMOS transistors, 100 MIPS

3.2 Speed of Forth

So what's the deal, why is SP-Forth so slow? At first, it is not the fault of SP-Forth. Other Forth systems like pforth or gforth are not much faster. The problem is, that the C++ compiler was designed for practical usage and is optimized for speed. The idea is to generate the fastest binary file which is possible. While Forth was designed as a teaching language. The aim is not to write code in Forth and execute this on hardware, the hardware is to use Forth code in theoretical papers without ever run the code on real machines. Forth is used mainly as a proof.

The idea is not to focus on real hardware or real software, but invent everything from scratch. A forth program is not created for an existing computer like the x86 CPU, but the assumption is, that no hardware is there and we must invent a computer from scratch. And the minimal computer has only two stacks: datastack and returnstack. In contrast, the C++ compiler is progrmmed for real environment. The assumption is, that working hardware is out there, and a working operating system is available. For C++ it makes no sense to build around a virtual CPU, but the idea is to use existing hardware for producing software.

Forth programming can be called a permanent reverse reengineering. That means not to use the work previously done but to invent the wheel from scratch. That means, if someone wants to invent a computer from scratch, write his own compiler and invent his own operating system from scratch, than Forth is a here to stay. If the aim is to use existing hardware/software for writing something new, than C++ is better.

Let us explain this in detail. The assumption under which Forth is operating is, that we are living in the 18th century. The first computer wasn't invented yet and all what we have are punch cards and relays. The aim is, to built a simple machine from scratch, program the first compiler, and write a hello world program which runs. This is a job perfect for Forth. It is possible, to build everything from scratch: the logicgates, the compiler, the machine model and even the programming language. The result will looks a like a computer made of wood and has an overall speed of around 1 instruction per second. The memory is limited. Forth is the slowest programming language ever, but it minimizes the amount of understanding of the system.

Homebrew CPUs projects are not created for the purpose of speed or inventing something new. A homebrew CPUs is designed for replicating existing work. The idea is for example, to build a 8 bit homebrew CPU out of TTL chips. Why would somebody do so? The real 8-bit cpu was invented 40 years ago, recreating this from scratch is done with an educational purpose. To understand how the system look like, and to gain knowledge about hardware itself. This is the main domain in which Forth is used. Forth is the natural language for homebrew CPUs. The resulting working system will be run very slowly (because it is an 8 bit cpu) and Forth will decrease the performance further. The advantage of this bonsai plant is, that everything is under glass. That means it is fully documentated and can be seen as art.

A good example for such projects is the “MARK I Forth computer”http://www.aholme.co.uk/Mk1/Architecture.htm The performance of the machine is extraordinary slow, and the Forth program run on this machines solves no real problem. Instead the project is proof of how to build such computers.

Let us investigate the MARK I computer in detail. What is his purpose? It is not possible to run a webserver on the machine, it is also not possible to run Linux on it. So why invested the inventor so much energy in the project? Because he proofed something. A proof is known from universities. The idea is to repeat something which is already known. For example: everybody is aware that 1+1=2. The problem of how to add to number is solved. But instead go further and investigating more complicated things the same problem is asked as an open question. A possible proof for 1+1=2 looks the following.

At first the teacher asks a question: What is 1+1? Somebody gives the right answer: 2. But that is not the answer the teacher want's to hear. The teacher is not interested in math itself, he is iinterested in testing his students. It is important to ask who has given the answer, and what his motivation was behind it. Making a proof means, to deal with problems, for which the answer is known to build an education system.

With the same purpose the Mark I computer was designed. How to program a computer is already known, but repetition is always good. The running Mark I computer can be used in a quiz. The teacher asks a question, for example, “What should i do to let the led blinking?” and the students must answer the question. The idea is not to discuss about computers, but build a school game, in which people are evaluated.

Formal Forth

“Forth provides us with a simple language with a programming environment and debugger. Due to the simple nature of Forth, this can be formalised much more readily than most other languages (Stoddart 1988)” [knaggs1993practical] page 37

It is a bit unfair to compare Forth directly with C. Because both languages are made for different purposes. C was made as a quick&dirty language for real systems. While Forth was created as an intellectual challenge in theoretical computerscience. Forth can be best compared with the MMIX cpu of Donald E. Knuth. The “MMIX Visual Debugger” isn't use for real programming tasks, it is a clean simulator for trying out algorithm in the university. The MMIX cpu uses lots of register. Forth is more restricted and has stacks. This makes Forth superior to the MMIX. It makes sense, to compare Forth with the MMIX, PL/0 or other educational computers. But it makes no sense to compare MMIX with C or Forth with C.

Let us take a look into “The art of computer programming”. The book-series is equal to a game. It builds a knowledgesystem in which questions can be asked which can be answered wrong or true. That means, first the author Knuth explains what sorting is, and then the students can answer a question to the problem. If he is right, he gets a point from Knuth. The book “the art of computer programming” is created as a whole big proof. That means, nothing new can be found in the volume, but the purpose is to gamify existing knowledge. I wouldn't call the book scientific or academic, it is more didactic. The idea is to use the book for teaching students in computer science. Teaching means, that the students can answer the question which are given by the instructors. Teaching means not, that the students are exploring topics outside of the course or find new answers. Donald Knuth is a wonderful teacher. He is very good in proofing something.

Let us take an example excersise from “The art of computer programming”:

“In a random sequence of a million decimal digits, what is the probability that there exactly 100,000 of each possible digit?” [maclaren1970art]

“Write a MIX program analogous to (2) that computes (aX) mod (w-1).” [maclaren1970art]

In the book are much of the above cited exercises and they have all in common. The answer to the problem is known. At least by Knuth or it is printed in the reference section as the correct answer. The question are well suited for testing students, if they have understand the book. That means, Knuth didn't invented algorithm theory, he invented the SAT test (Scholastic Assessment Test). Somebody may call the book from Knuth as mischief, but it is the opposite. It shows, what a mathematical proof is, and it can be used in a teaching environment. Proof means, to ask for something which is well known. It is some kind of riddle which is meaningless by itself, but defines the relationship between a person who has the power to educate the opponent. The difference is, that Donald E. Knuth can let a student fail a test, while the opposite is not possible. That means, Knuth creates the test for an exam, while his students must attend the test.

3.3 Performance

According to the official Forth FAQ a program written in Forth is around 4-8 times slower than the same algorithm written in C.http://www.forth.org/faq/faq1.txt section #3.5

So what's the deal, why are people using this slow language? Let us first try out the c-language in detail. If somebody is writing an algorithm in C he uses a compiler for producing machine code. Modern C compiler are really good in the task, in most cases the compiler generated machine code is as fast as handprogrammed assembly code. That means, a good c compiler let the hardware running on its maximum. Suppose we have written a primenumber algorithm in C and with -O3 compilerswitch the programs needs on a 2 Ghz AMD64 cpu around 100 seconds until it is finished. How we can improve the speed? With Forth it is not possible, and with a better C compiler also not. The problem is the underlying hardware. If we want to run the same program not in 100 seconds but in 1 seconds we need a faster cpu.

And here comes Forth into the game. The idea is to not use small optimizing techniques. For example, a better cpu architecture or a transputer which has many cpus at once is a big optimizing technique. And this is the reason why people are playing around with Forth. The idea is to not using small optimizing technique with the factor 5, but using big optimizing techniques in the factor of 1000. If we have for example a transputer with 1000 cpus at once, our algorithm will much more profit from it, than any c compiler can deliver.

3.4 Multi-core Forth machine

A possible computer consists of a large amount of main memory, for example 100 GB in DDR4-RAM. The memory consists of cells which containing data and program code. On the memory, many Forth CPUs are operating in parallel. Other modules like caches and registers do not exists. Let us go into the details:

The memory has an address space. It starts with $0000 and goes up to $FFFF. For example on address $1000 the main program starts. The first command is “4”. According to the Forth syntax the number is put on the datastack. .The next command on adress $1001 is also a number “2”, it is put on the stack. The next command at adress $1002 is the plus sign “+”. This adds the two numbers on the stack. What the Forth CPU is doing, or better the pointer of the Forth CPU which points to the Memory is simple. He is executing commands.

Now let us increase the complexity and not only use one Forth CPU but two of them in parallel. It is not possible that they are operating on the same piece of code. This would results into the same problem like in normal computing, for example CPU1 overwrites a memoryadress which is needed by CPU2. But it is possible to define separate threads which runs in parallel. In the Linux operating system this is called a background process. That means, CPU1 is executing code from $1000 to $1200 while CPU is executing code from $2000 to $2040. The threads are separated from each other.

Figure 3: Two Forth CPUs on one tape

Left is the datastack which has [5,3] while right is the return stack. The user can move the CPUs with pressing the arrow keys. He can also start the “fetch-decode-execute” cycle which is starting the normal Forth inner next loop.

If the programmer defines 100 threads, then 100 Forth CPUs can run simultaneously on the memory. Because each Forth CPU is minimal from design it has only 2 stacks: datastack and return stack plus some minor features like an instruction pointer. It is a minimal computer which consists of not more than 1000 transistors. This makes it possible to construct many thousands of Forth CPUs which runs in parallel on a large amount of memory. The only bottleneck is the bandwith of the RAM.

The problem is, that RAM can only accessed by one CPU at the same time. For example: on timeindex 0.1 seconds CPU1 is reading a cell, on timeindex 0.2 seconds CPU2 is reading a cell and so forth. The memory controller manages the access from the cpus to the memory. If the bandwith is low, the overall system will be very slow. That means we have around 1000 Forth CPUs but they can not access the memory at the same time. They have to wait.

Another problem is, that normal C sourcecode will not run. It is not possible to convert existing Linux programs into Forth syntax. And even this would be possible, it makes no sense because a sorting algorithm like Bubblesort has to be implemented different on a stackmachine. That means, the programmers have to rewrite all the software from scratch. This results into new operating system, new video codecs and new applications.

3.5 Why performance benchmarks are useless

Most amateur programmers love benchmark. They compare for example the execution speed of a C++ compiler with the speed of Java. The idea is to evaluate a certain technology with an objective judge and to be on the side of the winner. In the case of Forth a benchmark makes no sense. In many papers about new invented Forth software some kind of performance benchmark is given, but it says nothing. The reason is, that the intention of Forth is not to replace existing CISC CPUs or existing programming languages. If somebody want's to create a Forth benchmark then only with the aim to build a very slow Forth system.

My own Forth language simulator has right now a performance of around 1 instruction per second. This performance is reached with an artificial pause, which is executed in the thread. The normal SFML command was used to wait for exact 1000 ms. The result is, that the overall gameloop runs with 30 fps to get the movement smoothly and for reacting to user input, while the execution of the simulated Forth CPU is slowed down to 1 step per second. Otherwise, the user won't see very much, he would start the program and get a millisecond later an error command, because his Forthprogram is wrong. With an artificial slowdown the user can see in stopmotion what the simulated Forth CPU is doing and how the stack is filled with values.

My plan is to reduce the execution speed further, down to only 0.5 steps per second. The reason is, that in a dualcore CPUs two Forth VM are executed in parallel, and if both are running with maximum speed, it is hard for the user to get both in observation.

The intention of a Forth benchmark is not to run a given program at maximum speed. That is the privilege of production ready programming languages like C++.: They are used for writing real software. Instead, Forth is an education programming language, that means a slow execution speed is better. A similar project is the GNUSim8085 software, which is a tool for learning assembly language. The performance of the software is very slow, but this principle is right for learning Forth too.

GnuSim8085 is not the most sophisticated assembler. And most feature of GAS and NASM he didn't have. It is more a teaching environment, that means it is not possible to write real assembly programs with it, but it is a good tool for playing around with Assembly for people who have no knowledge.

Let us investigate how a dedicated Forth CPUs is presented:

“MISC processors will be much cheaper than RISC and CISC processors, and can compete effectively against them on the basis of favorable price/performance ratio.”http://www.ultratechnology.com/mup21.html

According to this statement the MuP21 is superior to normal CPUs because he is faster and needs less energy. This argument is not very relevant, it misleads the user into the direction, that he can use Forth to replace existing computers. If we want to promote Forth, than we should argue that Forth is slower and needs more energy. That means, a standard x86 PC in CISC architecture is able to run with around 10 Gflops, while a Forth CPU is for the factor 1 0 10 slower. The question is not how to shrink the size of the gate to compete with existing technology, the question is how to build a Forth CPU in hardware which is really slow, and how to program a Forth simulator which is even slower.

According to the specs, the MuP21 CPU was produced with 1.2 micron CMOS process. That is too tiny. What's about a Forth CPU build with a transistor size of around 1 cm? If the chip has 1000 transistors, the cpu would have a length of 10 Meter, which needs a special room for it. The advantage is, that the museum visitor can observe the device, walk around and touch it.

3.6 Forth has a big performance problem

Some papers about Forth are discussing the performance issue. The intention is to make clear, that Forth is a fast cpu architecture which is better than other systems. In reality, the performance problem is not about speed but it's the opposite. Perhaps I should give an example.. My new Forth language simulator has two CPUs who are working simultaneously on the same tape. The user types in the command “Run” and both CPUs are start working. It feels a bit like in a realtime strategy game, in which the user controls two units at the same time. The problem is, that it is hard to observe the scene. Even a artificial timelag of 1 seconds is used between two actions, the scene runs too fast. The problem is that it is easy to follow one cpu, but two cpus at the same time with a separate datastack is too much for the normal user.

How to overcome the problem is unclear, but increasing the performance is the wrong way. The better answer has to do with more timelag. Perhaps I should increase the waiting time from 1 second to 10 second, so that the user has time to see what exactly the stackmachine is doing?

What helps a bit is to not run the cpu in parallel but after each other. Not because of technical requirements, but this makes clear, that the user can follow the process. Perhaps I should describe what the worst case scenario is. A bad idea is to remove the artificial waiting loop, because this results into a not controllable behavior: the user is entering “run”, and without any delay, both cpu are at the end of the program. From the performance perspective that is the fastest mode. But the user sees nothing. It is useless to run the simulator in that mode.

3.7 Prefetching for the UNIVAC I tapedrive

Introduction

The rewind time of the tape drive was around 60 seconds, sometimes this is called “access time”, because it takes one minute until a certain adress is reached. If the CPU has no core memory and no cache, he writes direct on the tape. A command sequence like “read #00, read #A0, read #10”, takes each 60 seconds until the tape is adjusted to the adress. So the Univac needs 3*60=180 seconds for the operation.

Main

The UNIVAC I computer was made in 1951. His tapedrive had a rewind time of 60 seconds. That is also called the access time. 6 tapedrive units can be run in parallel. Let's assume we have a batch task. From tape1 we copy something to tape2. From tape2 to tape 5 and so forth. The exact batch is described in the figure 4. Because we need 6 read and 6 write access the overall time is 12 minutes until the job is done. The term prefetching means, to not tolerate that a tape has nothing to do, instead it is started in advance. From the beginning we give every tape a task and this results into a smaller overall time. It can be reduced to 2 minutes. The average access time is no longer 60 seconds, but it is:

120secondsoverall 12tasks =10secondspertask

Figure 4: prefetching for tapedrive

Can this be true that the virtual rewind time is only 10 seconds, while the physical rewind time is 60 seconds? Yes, the term is called pipelining or RAID and means, that every device is under load, and they are working together. More inexpensive tapedrives will decrease the virtual rewind time further. 60 seconds down to 10 seconds is equal to 1 tapedrive up to 6 tapedrive. That means, if our goal is an accesstime of 1000 msec, we need 60 tapedrives in parallel.

4 Hardware

4.1 Parallelcomputing

Sometimes the Inmos Transputer is called a parallel computer, because it combines many cpus on one chip. But in reality, the so called Harvard architecture is also a parallel machine. The signal pathway is used for connecting subcomputers together. Let us describe a small example. Suppose the assembly command “mov ax,bx” should be executed. For doing so, many substeps are needed. If we are running the steps after each other it takes some time. The alternative is to construct many small units which are handling one piece of work and then we are combining these unit with an assembly line. The assembly line on a cpu is called signal pathway. On youtube an animation of a working Intel 8085 is available which shows the principlehttps://www.youtube.com/watch?v=nABqe5SctHY It looks similar to a Simcity game in which cars are driving around. The most dominant feature is, that the signal are independent from each other. They are running at the same time.

Let us describe the situation with a city. At 09:00 AM person1 is driving with his car to school. At the same time at 09:00 AM person2 is driving with his car to the hospital. At 10:00 AM everybody is on his place. The global clock has only recognized that the process has taken 1 hour. But in reality, the time for driving was 2x1=2 hours. Todays cpu's are described by their speed in Gigahertz. 1 Ghz is equal to 10^9 operations per second. A very good cpu reaches 10 instructions per clockcycle.

Queue machines

[husemannstudy2016] introduces queue machines on an example. We have the formula

5*( 2+8 ) ) − ( 4/2 ) and the idea is to calculate the result in parallel. At first, the abstract syntax tree is generated, and this tree is executed in parallel. That means, 2+8 is calculated at the same like 4/2 be different execution units which are normal dualstack pushdown-automata.

Threading

In mainstream computing the idea of threading is well known. For example, the C++ programming language is supporting this feature as default. The reason why threading is used not very often has to to do with the fact, that according to the hardware only a few cpu cores are available. That means, if the programmer is creating 2 threads, this double not only the performance but also the power consumption of his CPU. But threading itself is the answer to every performance bottleneck. It must be used with many cores at the same time.

Let us first investigate how we can decide if a program can be run parallel or not. There are two options, the first one is called automatic pipelining and it is used on the hardware side. That means, a chunk of assembly code is loaded into the cache of the cpu, and the processor is investigating if a task can be run in parallel. This decision is mostly wrong and the possible performance improvement is only minimal. The other option is to let the programmer decide if his program can be run parallel or not. A typical example is a brute-force solver for the traveling salesman problem. The programmer can define 10 threads, and each one is calculating a piece of the problem.

Let us describe a situation in which threading can speed up the performance. At first we need many cpus, for example 1000. And then we let the programmer decided how to use these cpu in parallel. That means, it is not necessary for doing the pipeline step automatically on the abstract syntax tree of the program instead the programmer decides how to map the work to the hardware.

In theory, the mapping can be done automatically, for example with analyzing the bytecode and decide according to rules which chunk of the code can be executed parallel. But, this advanced computing and for parallel computing it is not necessary to do so.

Suppose, we have 1000 different dual-stack pushdown automaton arranged in parallel. A normal sourcecode written in Forth can't be executed by the stackmachine cluster, because it is unclear if a subroutine needs as input the output of another subroutine. But, if the programmer has decided from the beginning, that all routines can be done in parallel, there is no need for waiting of the results of a previous step. That means, the programmer has scheduled the tasks of his “Traveling salesman problem” manual to the computing cores, and they are starting to calculate.

It is correct, that it is difficult or sometimes impossible to run sourcecode in parallel. The problem is, that in the sourcecode is defined:

input (takes 10 seconds)
calc1 (take 20 seconds)
calc2 (takes 60 seconds)
output (takes 10 seconds)

and the “output” routine can only be started, if the previous subroutines are done. And is indeed an interesting question, if we can analyze the sourcecode (or his abstract syntax tree) in way that we can automatically decide which commands can be executed in parallel and which not. But, this question is only a minor problem in the subject of threading. It solves the question what the cpu can do, if the programmer didn't defined manually what codeparts can be threaded.

The more easier way is, if the programmer has already written explicit, that the calc task can be run in parallel:

input
parallel: calc1, calc2
output

Here is the situation clear. We can assign the calc task to different cpus. But, the programmer will only define task as thread ready, if the hardware is able to run the task with higher performance. Todays computers only have 2 cores, and it makes no sense to specify a complex threading hierarchy.

4.2 Why Forth is better than Intel Xeon Phi

The latest iteration in Intel products is called Intel Xeon Phi, “Knight Mill” According to the specs, the processor can deliver 3.5 Teraflops and consumes 300 Watt energy. At first, we must calculate the Gigaflops per Watt value, which is 11.7. Then we can calculate the energy consumption per flops, which is 85 picojoule per flop (1/a*1000). Now we compare this value with the GA144 forth CPU by Greenarray, which needs only 7 picojoule per instruction.http://www.greenarraychips.com/home/documents/greg/PB001-100503-GA144-1-10.pdf So we have as the result:

  • Intel Xeon Phi, 85 picojoule per flops
  • GA144 Forth CPU, 7 picojoule per Instruction

The Greenarray chipset is around 12x more energy efficient than the Intel hardware, which is a lot. So we have proven that Forth is superior to Intel engineering.

4.3 Pipelining

Understanding Multiprocessing and Pipelining isn't easy, but it seems, that the parallel executation of many tasks the key is for increasing performance of a CPU. In the figure “pipeline” an example is given, which is well known from other introductions into the topics. The overall tasks consists of steps like “prefetch, fetch, decode and so on”, and the tasks are scheduled to threads. The first surprising fact is, that a thread is not responsible for a timestep, but a thread is used for run the entire task. What does that mean? In C++ we would create a cycle with the command from figure “Thread in C++”. That means, not every subtask is run by a different thread, but we have only one thread, which has the complete game-loop. The idea is, to run many game-loops at the same time with a small offset.

void cycle() {
  prefetch();
  fetch();
  decode();
  execute();
  free();
}
std::thread t1;
t1=std::thread(cycle, this);
t1.detach();

Let us investigate the table a bit in detail. The main problem of the fictional cpu is, that the subtasks have an order. That means, we can execute something only after the value was decoded. It is not possible to change to order, it is a fixed workflow. Let us watch, how the table has solved the problem. The “execute” subtask is highlighted in grey. What we can see is, that the CPU isn't run the subtask at the same time, but with a slow speed after each other. From the perspective of “execute”, there is no pipeline, because in every timestep one subtask is executed.

Figure 6: pipeline

But, the overall figure shows that all the tasks are running in parallel. We see a virtual CPU which is vertical in the table. The virtual CPU goes from bottom to top and contains all the subtasks: prefetch, fetch, decode and so on. And they are executed at the same time. Let us go further into the details. The constraint is, that the workflow has a certain order. That means, the subtasks “prefetch, fetch, decode and so on” can't be run at the same time, but must executed after each other. “Fetch” is only possible if the previous step was done. Every subtask needs a certain amount of time, for example 1 second. If we have 5 subtasks, with each 1 seconds, then it takes 5 seconds for the overall task, right?

Our aim as the manager of the game is to reduce the duration. Let us compare two options. The first one is the sum of the seconds. 6 threads with each 5 seconds needs 6x5=30 seconds. A look in the figure “pipeline” shows us, that it is possible to run the threads only in 10 seconds, because the last action is ready in the timestep 10. What is the trick?

At first we can calculate the raw numbers. 10 seconds overall time consumption is equal to 1.66 seconds per thread (10/6). That means we have increased the speed from 5 seconds for running a task downto 1.66 seconds. The same reduction is done for each subtask. Prefetch needs no longer 1 seconds, but only 0.33 seconds. Can this be true?

According to the table we have 30 subtasks overall. Each of them needs 1 seconds. On the same time, the measured time for executing the subtasks is only 10 seconds. And that is equal to 0.33 seconds x 30 = 10 seconds. The measurement is correct, the duration of a single task has the normal 1 seconds, but it has also 0.33 seconds. The second reduced duration is the result of the above mentioned virtual CPU. It is something which we have constructed by ordering the task in a pipeline. The virtual CPU needs for the prefetch subtask only 0.33 seconds, and the same amount for all the other subtasks. It is a super-processor which is the result of the coworking threads.

workpackage

In the timestep 5 a batch-job is executed by the virtual cpu. The request contains of the following tasks:

  • prefetch instruction #5
  • fetch instruction #4
  • decode instruction #3
  • execute instruction #2
  • free instruction #1

Because the subtasks are dealing with different instructions, they can be run simultaneously. The physical cpu gets this batch-menu. The reason why the subtasks are executed is unknown. Prefetching instruction #5 has nothing to do with “fetch instruction #4”.

4.4 A cache for Forth

Classical mainstream computing contains of RAM, second level cache and first level cache. The idea is, that an algorithm – written in C – can access fast to every cell of the memory. For example, an image with 200 kb is stored in RAM and the CPU is deciding to take parts of the data into his first level cache to run a compression algorithm on it. What is the Forth way of doing so?

At first, a Forth CPU has no first level cache. It has only the dualstack and the memory. Access to the RAM is expensive and storing large amount in the datastack is not possible. How does a Forth CPU stores large amount of data? This is unclear, perhaps it is not possible. Let us watch a real Forth CPU like the GA144. The total amount of RAM on the chip is small. Instead the system is optimized for small amount of data. For a normal algorithm this could be possible, but what is in the case of large image files, which are need 200 kb and more of storage. Where do we are storing them?

At first, let us investigate an option to overcome the problem with traditional methods. The need is to store 200 kb near the CPU, so that the CPU can access the data fast. Doing so is possible with a first level cache plus a register based machine. The registers of a CISC machine can be seen as the zero-level-cache. It is the fastest sort of memory direct on the chip. So the answer to the storage problem is not using Forth and switch to a modern register cpu including a cache.

This is what we see in reality, but perhaps it is possible with Forth to give another answer. A stackmachine is per definition not a registermachine, that means, we can not add registers or a cache to it to improve the memory. And using a stackmachine together with a big RAM makes no sense, because the access to the RAM is slow. The Forth way for solving the problem is to use an array of stackmachines. That means in reality. A standard Forth CPU consists of a datastack which is not deeper than 5 bytes. That is the total amount of RAM the system has. If we need more RAM for storing a 200 kb image, we need more Forth CPUs. 40000 single Forth CPU with each 5 bytes = 200 kb RAM. Such a system is called in literature a stackbased “Cellular Automaton Machine”.

This sounds a bit complicated so a short example may clarify the point. Suppose the image we want to store is 200 kb in size. We are divide the image in 1000 parts, each is 200 byte long. The 200 byte are the memory of a Forth machine. Additionally the CPU has 5 bytes for the datastack and 5 bytes for the return stack. So we have 1000 mini computers which have access to a different part of the memory. Each Forth CPU can read/write only 200 bytes. This is some kind of intelligent RAM. The only problem is, how to program the tiny little devices?

4.5 Forth GPU

The topic of a graphic card is ignored in the Forth community. The reason is, that a standard microcontroller doesn't have such a device. But from the perspective of understanding Forth it makes sense to focus in detail on the topic. The application of visualizing pictures has a lot of interesting challenges. At first, the usually graphics card RAM is big in size. A standard display has at least 1 million pixel plus color information (= 1 MB). If the graphics card should render 3d graphics the number of needed pixels is higher. Also the painting of shapes likes lines is very computational intensive. That means, if someone believes to understand computing and want to minimize the hardware, than building up a graphics card from scratch is a good place to start.

[morreale1984general] is one of the few paper who are describing Forth together with graphics processors. Finding such information is difficult, because most papers of the 1980s are on microfiche and not online available and searching for the term “Forth” brings in most cases no useful information. Another example, which uses the GA144 cpu for graphical output on a watch is given on github.https://github.com/mflamer/GA144-watch

5 Examples

5.1 Implementation

Visualizing a dual-stack pushdownautomaton is relativly easy with the SFML game engine. In the following figure the first prototype can be seen. Instead of programming another “Forth like system” on a x86 cpu this project has the purpose to program against a graphic-display. That means, the Forth VM is only visible onscreen as some kind of interactive game. The user can so far control the instruction pointer with left/right button, and he can even push some values to the datastack. It is done with a textcommand. The idea is, that the game can interpret a normal Forth program and run it in singlestep mode.

Figure 7: Stackmachine in SFML

5.2 Testing out SP-Forth

In the following section, I want to describe a concrete Forth system. I've choosen the SP-Forth systemhttp://spf.sourceforge.net/, because it has the feature to compile Forth code into native x86 machine code, which improves the performance. The first step is to download the 744 zip file for the Linux operating system. After unpacking lots of documentation, examples and the SP-Forth executable file is visible. The file “./spf4orig” is 64kb in size and after executing it, the screenshot “SP-Forth” is shown.

Figure 8: SP-Forth

Instead of the Gforth system, SP-Forth needs the command in big letters. A demo program can be:

: MAIN 10 0 DO I . LOOP ;
 Ok
MAIN
0 1 2 3 4 5 6 7 8 9  Ok

A problem with SP-Forth is, that the documentation is only available in Russian. One advantage is, that it supports multitasking out of the box. But let us try out something different. We want to run a program from a file. At first we need the same Hello World program in a file “hi.f”, with a small modification and then we can execute the program together with SP-Forth. The result is the same, the numbers from 0 to zero are printed on the screen.

# hi.f
: MAIN 10 0 DO I . LOOP ;
MAIN
BYE

# commandline
./spf4orig hi.f

A speed comparison between SP-Forth and C++ is always possible, and the SP-Forth system is way more slower.

sp-forth:
: SUM 
0 1000000 0 DO I + LOOP
.
;
: MAIN 1000 0 DO SUM LOOP ;
MAIN
BYE

time ./spf4orig sum.f
real	0m2,435s
user	0m2,425s
sys	0m0,005s


C++:
#include <iostream>
void sum() {
  auto result=0;
  for (auto i=0;i<1000000;i++)
    result+=i;
  std::cout<<resulti<i<" ";
}
int main()
{ 
  for (auto i=0;i<1000;i++)
    sum();
  return 0;
}

g++ -O3 sum.cpp
time ./a.out
real	0m0,137s
user	0m0,132s
sys	0m0,004s

6 Other languages

6.1 BCPL vs Forth

In the late 1960s a major decision was made in history of computing. It was a showdown between the Forth system and the BCPL language. The mainstream in computing decided, that BCPL is a here to stay, while the Forth idea was ignored.

Let us go back in time and investigate what the difference is. BCPL was the predecessor to C, and C evolved into C++. Also BCPL is a programming language dedicated to register-based cpus. In contrast, the Forth VM was oriented on a stackmachine and evolved into factor and joy and modern Forth2012 standard. The contrast between both developments is massive. While C like language are on top of the TIOBE index, is Forth usually on the last place. Both developments can be seen as contradiction.

At first, i want to describe the idea behind Forth. The Forth VM is using an abstract computer, called two stack pushdown automaton. It has a tape, a datastack, a return stack and an execution unit. Programming in Forth is equal to use dataflow programming for the Forth virtual machine. In contrast, the idea of BCPL was to define also a virtual machine, but the BCPL virtual machine is using 5 registers, according to the orginal paper: Accumultor, some index register, programcounter and so forth. The idea of the BCPL virtual machine evolved into todays computer hardware which is called a CISC architecture. Such systems were programmed in C like languages like Python, C, C++ and Java.

Which architecture is more powerful? For answering this question let us describe what the disadvantages are. Suppose we want to program in a C like language with lots of variables and compile the sourcecode to a stackbased processor. It is very difficult to do so. Mapping a high level programming language onto a stackmachine is not possible. Instead what the hardware manufactorers are doing is the other way around. They are modifying the cpu to support highlevel programming languages. That means in reality they are increasing the number of registers to at least 32, and they are increasing the number of machine opcodes for supporting multimedia and other operations.

Now we want describe in which the above cited CISC architecture is not very good. If the aim is to research computer programming and hardware design on a formel level, than are languages like C not the best idea. The reason is, that it is not clear how many registers the cpu has, or what the standard of a language is. A c like programming language can be anything, and a BCPL like virtual maschine too. It is not possible to describe algorithm on a formal level.

What we seen today is both. Computer scientists are loving Forth based stackmachines, because they are so simple and can be seen as a runtime version of a turing-machine. A virtual machine in stackmachine architecture is a general computer and is well suited for analyzing the runtime behavior of an algorithm. It is also possible to create new hardware with it.

On the other hand, C like programming languages and the underlying CISC based hardware is very common in realiy. They can be used for implementing real operating systems like Linux and write software in business environment.

The major difference between BCPL and Forth is, that the BCPL architecture has evolved over time. Today, the C language looks different and the CPU which is executing the commands too. That means, the AMD64 architecture is very different from the 386'er systems from Intel. And todays C++ is another language than the earlier BCPL language. In contrast, Forth like systems looks the same over decades. In the 1970s the Forth virtual machines consists of two stacks and today nothing has changed.

In bad literature, not a stackmachine but the turing machine is described as a universal model for a computer. But a turing machine is less important. The problem with turing machines is, that the program code is separated from the tape. Usually, the tape represents the input and the runable code is located in the head of the turing-machine. A second problem with turing-machines is, that the code in head is not changable, it is a Finite state machine. In contrast, the Forth like VM is the better model. It can also run every algorithm but can programmed with subroutines and variables.

Often there is a misconception out there. The idea is, that it is possible to program in Forth like in C++. The idea is, that Forth is simply another programming language. But it is more. Forth has more in common with a dataflow language for a abstract machine. That means it is not possible to program Forth without interactive. Perhaps an example:

A C++ programmer is oriented on the sourcecode. He writes down 2 pages of sourcecode like he write a text. And after 30 minutes, he presses the run button to see what his program his doing. It is batch workflow, and if the programmer has experiance, the time between every run of the program is longer. That means, the programmers sees only from the sourcecode what his program will do. He uses the compiler only for error checking.

In contrast a Forth programmer works interactivly. What his machine will do is unclear, because new defined Forth words are not predictable. The idea is not to program something, but to prove if a program is correct. Let us describe the difference in detail.

C++ programs are written because the user needs software. He writes for example a game, and if the sourcecode is complete the computer can do something interesting. That means, C++ programming is done only for creating sourcecode. The idea is to increase the number of software.

Forth programming is done with a different goal. Forth is mainly a teaching language. The idea is not to program with Forth a database application or a game, the idea is to use Forth as a tool for educating students about computer science. That means, the typical Forth projects has not produced anly lines of code after 1 year, but around 100 students with a diploma in computer science who are familiar with hardware and algorithms. Instead, C++ is not very well suited to teach students in computerscience. C++ is a language for the practice. It has no clean design, and is driven by learning from doing.

Anti pattern

At the end of the small comparison between BCPL and Forth I want to describe a behavior which results into failure. It is an antipattern which is ignoring the advantages of a language. What the user is not recommended to do is using Forth as a programming language for creating an operating system or a GUI application. Also not recommended is the idea, to use BCPL for teaching computer science at the university. It will fail also.

BCPL can be described as a language for writing computer sourcecode, while Forth is an education tool for teaching students. That means, if a professor at the university, uses Forth to explain what a compiler is, then the professor is really good in his subject. And if a programmer in daily life is using a BCPL like language for creating software, then is also an expert in his domain.

6.2 C++ vs Forth

The difference between Forth and C++ can not be greater. Both languages are very old. The development of C++ started in the 1970s with the BCPL compiler which were later extended with Simula-like features. The stackbased Forth language was developed in the same time. Both languages can be seen as unique, that means they are very powerful. But it is not possible to switch between them, because their philosophy is different.

At first I want to describe the mindset between C++. C++ is a very fast language, which compiles on standard-processor into productive code. The OOP features of C++ makes it easier to program large applications, maintain them and use many programmers in the same project. That makes C++ the number 1 language for game development. A typical game consists of 10000 different sprites, which different behavior which are interacting with the game-engine. C++ handles this complexity and allows the programmer to create sophisticated programs.

The Forth language was designed with a different purpose. Their main application is in science-classes at the university. Forth is a minimalist language. It is possible to explain the working of a complete CPU, an operating system and a compiler with Forth. The idea behind Forth is not to grow program complexität, but to reduce it. If a Forth program has 2 kb in sourcecode, a much better program occupies only the half. The advantage of Forth is, that researcher can try out on a theoretical level many kind of architectures. For example, if they are parsing Forth sourcecode into an abstract syntax tree (AST) they can investige which parts of it can be run in parallel. Creating the abstract syntax tree for a C++ is way more complicated.

The best practice method is to use Forth and C++ together. C++ as a compiler for creating real applications. And Forth as a research plattform to play around with new algorithm, computer architectures and compiler designs. Let me give one example. :Suppose the idea is, to make a performance test of a new parallel computer. If we are ignoring Forth and focus entirely on C++ and Intel CPU we would take a standard-C++ program which has 100 MB sourcecode, a standard Operating System like Linux and a standard X86 Processor which has 5 billion transistors. We are wiring all these compentents together and investige who to run it in parallel. Will this setup works? Never! There are to many components in the loop, it is a mess. The better approach is, to write in Forth a small CPU, program a operating system from scratch and run some applications on it. The overall project has around 2000 lines of Code in Forth and it easy to inspecting every single piece of it. If something goes wrong it is possible to go down to the transistor to search for the failure.

Let us investigate some antipattern. If someone is trying to use Forth for build a production application with 1 million lines of code, he has made a mistake. The project will fail horrible. The reason is, that Forth do not support OOP. Another problem is, that the performance of Forth is poor. Another Antipattern is, to use C++ in a computer course for explaining what a compiler is. C++ is to complicated and the knowledge right now is in 6 months obsolete.

Forth advantage

Let us watch a homebrew CPU project which isn't aware of Forth.http://www.megaprocessor.com/architecture.html is a CPU entirely made of transistors. The system was build for educational reasons, and perhaps the inventor had a lot of fun with it. According to the schematic diagram it is a register based machine, which is equal to modern computers, which also have registers. The result is, that the above cited homebrew project is more complicated than it should be. The instruction set is high, and the wiring to complicated. Forth can solve the issue. Let us investigate the idea in detail.

A minimal computer uses fixed logicgates and it is not programmable. The term in the literature for such device is Finite state machine. That means, the transistors are ordered to a logic circuit and a input signal results into an output signal. Such device can not be programmed, so it is not a real computer. The next step after a wired Finite state machine is not a register based computer, but between them comes the Stackmachine. That is a minimal computer which is described by the Forth community in detail. It is like a FInite state machine, turing capable, but can execute a program. The program can be changed so it is possible to program the device.

The open question until now is: how to build out of transistors a stackmachine which runs Forth? The answer is not given. Most homebrew projects are devoted to register based cpu architecture. But Forth can provide a realy minimalistic device. In that sense, that it is not able to build a computer more minimalistic as Forth. The Megaprocessor consists of 15300 transistors for the CPU and additonally 27000 transistor for the RAM. The overall transistorcount is higher than it should be. The J1 Forth CPU for example, but i do not need to explain the details here.

Let us first describe, what in the Megaprocessor project was done right. The general idea to build a cpu from tranistors in an educational computer is quite good. So the students can understand what a computer is on a very basic level. What was done wrong in the project, is the architecture of the CPU itself. A switch from a register machine to a stackmachine would increase the educational value for the public.

I think the overall question can be answered with the transistor count. The intel 4004 for example had around 2300 transistors. I would suggest, that it is possible to reduce the amount.

Forth disadvantages

After some applause for Forth it is time to describe the disadvantages. Forth was inventend around the year 1970, until now it wasn't used on real softwareprojects like Operating systems, games or network programming. Instead a different architecture was successful in the mainstream, which is called register-based CPU plus C programming language. The reason why has to do with the need for bigger software projects. The major problem in computing is not to improve the hardware, or make it more faster. The major problem is the software crisis. That means, around the 1970s modern computers from this time had enough RAM and enough harddrive capacity. The problem was, that no software were available. And until today, this problem is not solved. That means, a cheap PC brings around 4 GB RAM with it, and 200 GB harddrive, but how to fill the space?

The computing history has answered the problem with three ideas: compiler languages like C, objectoriented programming like C++ and Service-oriented architecture. The idea is to abstract from a concrete machine and separate between hardware and software. Let us investigate how compiler languages like C are implemented on a CPU. Programming such compilers is done on register machines. They have special registers and context-switching possibilities. The concept is called “compiler friendly” and means, that the CPU has to adapt to the compiler with the purpose to support high-level-language programming. A Forth system can't adapt to anything. It is not possible to run a C program on a stackmachine. That is the reason, why Forth is not used in real software projects, especially not in big softwareprojects which are working with object oriented programming.

Is it possible to create large applications will million lines of code in Forth? No. Is it possible to replace large projects with smaller programs? No. The reason why the Linux operating system consists of million lines of code and not only of 10000 is, because Linux is so powerful. What we have seen in computer history is a double strategy. On the one hand, we had register-machines, C, C++ for real life big projects, and on the other hand the Forth language was developed into a teaching language for simplified lesson in cpu architecture.

Let us describe the above cited service oriented architecture in detail. On a technical level it is equal to a REST services on the internet and to a abstract class on a local machine. The basic idea is, to create a layer which hides complexity and provide certain interfaces to the outside world. For example, somebody has written a 10000 lines long program which is doing something. Service oriented architecture means, that he programs on top of the program in addition 400 more lines of code, which have the only reason to provide a clean number of functionnames. The additional 400 lines of code have no direct task, they are from the program itself useless. But there are important if the program is used as a library which works together with other libraries. Such feature is enabled in the C++ framework as default, and it is the reason why the language is used so widespread. And yes, such a feature results into bloatware. That is software which is large, and is doing nothing.

6.3 C Compiler for Forth?

[shannon2006ac] describes a C compiler which targets a stackmachine. The idea is, that writing programs in Forth is difficult and it would make sense to use existing C sourcecode and run a Forth machine. Doing so is not easy, and the paper has failed to realize such a compiler. But let us go into the details.

The compiler which was used is LCC. The reason is, that LCC makes it easy for writing a special backend for a stackmachine. The idea is, to make “stack schedulinig”, that means to use the stack like register and put C variables on the stack. The problem is, that in reality a stack and a register are different things. For example, if a variable is put deep in the stack on position 12, it is not possible to access the variable anymore because according to the definition only the top element of a stack can be retrieved.

The problem is on the side of the c programmer. He writes perhaps a subroutine which uses 5 local variables, for storing temporary information, the result or whatever. The compiler is trying to convert the sourcecode into machine code. On a register machine it is not a problem, because the local variables are stored in the registers which makes it fast to access them. On a stackmachine such registers are not available. What the compiler can do, is to to store the variables in the memory. But, if the CPU wants to access it, it takes to much time.

To make the point a bit clear: It is not possible to write a c-compiler for a stackmachine. It will be always slower. That is the reason why mainstream computing is devoted to register machines, because such machine fits better to c programming and C++ programming. Apropos C++. How difficult it is to map objects to a stackmachine? Really difficult.

What is the alternative? The alternative is not use existing C programcode but writing new sourcecode in Forth. Forth subroutines are using usually no local variables and they run very fast on a stackmachine. There is only a minor problem: the number of sourcecode written in Forth is small.

We can see the lack of C compiler for stackmachines as an advantage, because it results into an environment which is different from mainstream computing. In productive softwaresystem everybody is using C like languages which runs best on x86 CPUs. And for educational purposes the other way around is the gold standard. That means to use a stackmachine together with Forth. The absence of C compiler marks the border. There is a difference between Forth and the rest in computing. [shannon2006ac] has proven, that it is not possible to sit in between.. The user must decide between Forth and C.

6.4 Beginners are chosen Assembly language, experts C++

It is a surprising fact, that people who have no programming experience at at all are preferring Assembly language for their first trial. The expectation might be the difference, because programming in Assembly is called demanding and writing an operating system from scratch is outside the scope of amateurs. But in reality, many of them are doing exactly this. The reason is, that from a certain point of view, Assembly is a beginner friendly language. The ordinary program is very short and is optimized for a certain type of architecture. If someone has invested one week into reading some books about Assembly, he can start writing his first graphics demo and even program his first game.

There is only one problem: Assembly is not used by professionals. The reason is, that they are not interested in writing short programs for a certain CPU type. Instead the profi programmers is writing longer programs with million of lines of code, and he is doing so under existing operating system. This results into a different programming culture. That means, apart from the stereotype C/C++ is the hardest programming language ever invented. Not because it is so difficult to store a number in a variable, but because of the problems, C++ programmers are trying to solve. They are not interested in getting 10% more performance out of a certain loop, instead C++ programmers try to fix a database application or the write a game against a given game engine.

In contrast Assembly programming has very much to do with the language itself. Which instruction set is used, how to iterate through a loop or how to make certain tricks on AMD64 PC. Both types of programmers are interacting with the computer, but Assembly programmers are doing so in an beginner friendly version. That means, they are not solving any real issues, instead they are putting graphics on the screen or they playing around with the interrupt.

References

  • [beiu2005sub] V Beiu, J Nyathi, and S Aunet. Sub-pico joule switching: High-speed reliable cmos circuits are feasible. In Intl. Conf. Innovations Info. Tech.(IIT05), 2005.
  • [husemannstudy2016] Jörg Husemann. A study of queue machines. 2016. https://es.cs.uni-kl.de/publications/ datarsg/Huse16.pdf.
  • [khan1997design] Fitratullah Khan and Sohail Anwar. Design and construction of a pc-based stack machine simulator for undergraduate computer science and engineering courses. In Frontiers in Education Conference, 1997. 27th Annual Conference. Teaching and Learning in an Era of Change. Proceedings., volume 3, pages 1234–1238. IEEE, 1997.
  • [knaggs1993practical] Peter J Knaggs. Practical and Theoretical Aspects of Forth Software Development. PhD thesis, University of Teesside, 1993.
  • [koopman1989stack] Philip Koopman. Stack computers: the new wave. 1989.
  • [maclaren1970art] M Donald MacLaren. The art of computer programming. volume 2: Seminumerical algorithms (donald e. knuth). SIAM Review, 12(2):306–308, 1970.
  • [morreale1984general] Jay Philip Morreale. A general purpose graphics processor. 1984.
  • [null2003guide] Linda Null and Julia Lobur. A guide to the marie machine simulator environment, 2003.
  • [shannon2006ac] Mark Shannon. Ac compiler for stack machines. Masters thesis, University of York, United Kingdom, 2006.
  • [yehezkel2001three] Cecile Yehezkel, William Yurcik, Murray Pearson, and Dean Armstrong. Three simulator tools for teaching computer architecture: Little man computer, and rtlsim. Journal on Educational Resources in Computing (JERIC), 1(4):60–80, 2001.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

EOF