May 23, 2026

Lessons learned from Douglas Lenat's Cyc

 During the late 1980s the Cyc project was a large scale AI project. The promise was to create a database with handcrafted Lisp rules which is able to reason about the world. The attempt has failed but that is no problem because it its possible to analyze the reason why.

From today's perspective Cyc was an early attempt to create a dataset. A dataset is a .csv file but doesn't contain of computer code. Datasets are storing numbers and text. During the 1980s it was unknown how to create large scale datasets and Cyc had some builtin mistakes:

a) there was no word2vec algorithm which allows to convert the textual information into numerical representation
b) Cyc was encoded with rules but not with question answer pairs

A modern dataset which is superior over cyc would solve these mistakes. A common dataset used for training neural networks contains of a simple Q&A structure like "What is the capital of france? -- Paris". and it would use a word embeddings algorithm to project the information into a numerical space which can be parsed by neural networks.

The Cyc knowledge base was a combination of Lisp software and textual information. It was a hybrid of computer code and a dataset. Such kind of knowledge base was replaced by data only datasets which have become popular since the deep learning boom. In a data only dataset there is no computer code but only data itself which can be text or images. The computer code which is searching in the data is externalized in a deep learning library.

No comments:

Post a Comment