July 08, 2019

A search engine for Forth related literature


A naive introduction into the Forth universe would explain, what a datastack is, how Forth is dealing with floating point operations and what the difference is between swap and drop. The following tutorial won't explain such topics, because there are a lot of tutorials available who are explaining the Forth language itself. The more elaborated way in approaching the most fascinating programming language is to analyze the Forth universe from a librarian's perspective. Forth is primarily a subject for writing books, journal articles and weblog posts. And if a newbie is interested to get more knowledge about it, he must read the existing books.
The problem with Forth related information is, that it's hard to search in the corpus. The amount of literature is small, most documents were published 20 years ago and the server address is no longer available. To handle these problems, the first step is to make a keyword list which helps to identifies existing online documents about Forth. The following list contains the amount of hits in the Google Scholar search engine:
"Forth programming" 734
“Forth-83” 547
"Rochester Forth Conference" 421
“Forth-79” 418
"Journal of Forth Application and Research" 368
"euroForth" 326
“Gforth” 218
“fig-Forth” 212
"SIGForth" 190
“Forth-2012” 187
“ANS Forth” 162
"swiftforth” 29
“VFX Forth” 24
“Win32Forth” 40
“RetroForth” 11
“MacForth” 42
“BigForth” 24
"figForth" 22
“SVFIG” 27
“ColorForth” 41
sum 3873
Now we can use these keywords for creating a search string which will return only documents about Forth.
["Forth programming" OR “Forth-83” OR "Rochester Forth Conference" OR “Forth-79” OR "Journal of Forth Application and Research" OR "euroForth" OR “Gforth” OR “fig-Forth” OR "SIGForth" OR “Forth-2012” OR “ANS Forth]
Unfortunately, Google has a maximum in the string length, so the keyword list was reduced. But nevertheless, now it's possible to search for subtopic within the Forth universe. Suppose, we want to know, what was written about “machine learning”, then we have to add the keyword at the end. The interesting point is, that a fulltext search allows the reader to force the corpus into a certain direction. That means, the reader can decide which sort of documents he will see. This allows, to browse in the content more efficient.
Let me explain the advantage. In a conventional tutorial i would write down some reading recommendation. For example, it's recommented to read first the paper from Chuck moore, then read through the eforth handbook and then visit the UK Forth users group. Does this recommendation make sense for anybody? No, it's my personal list, and a different user will have a different information desire. The better idea is, if the user is searching for exactly this information snippet he is interested in. Mostly, it has to do with it's own project or it's own open question. And a fulltext search engine like Google may help.
Let us take a deeper look into the list of Forth keywords. At the beginning, some major Forth conferences were listed which have generated lots of proceedings, Then the correct name for a Forth standard plus some mainstream Forth implementations are listed. If a document contains one of these keywords, it is a paper about the Forth programming language.
The total amount of documents is low. Google Scholar has only 3873 papers overall. And it's likely that some keywords are combined in the documents so that the real amount of papers is much smaller. I would guess, that around 1000 papers can be searched with the technique. If we put the search string into the normal Google websearch the amount of hits is higher.