How Artificial Intelligence Learns

Charlie McHenry
5 min readMay 23, 2023

--

Artificial Intelligence (AI) evolved over several decades from the seminal work of notable early and contemporary researchers. Names like Alan Turing, John McCarthy (who coined the words artificial intelligence in the 1950’s), Allen Newell, Herbert Simon, Marvin Minsky, Jurgen Schmidhuber (controversial), Geoffrey Hinton (who recently left Google to warn the world about the potential negatives of AI) and Yaan LeCun, chief AI scientist at Meta and distinguished professor at NY University.

The advent of capable digital neural networks, largely based on the work of Hinton and Schmidhuber, hosted on powerful, parallel processing computers furnished the architectural, hardware and software underpinning of AI and the combination proved to be critical enabling technology for what was to come. With this new architecture, developers discovered they could ‘train’ their systems to improve performance. A milestone had been reached. Their systems could now learn. And so ‘machine learning’ (ML) became a thing.

Put simply, trained neural networks enable computers to make intelligent decisions with minimal or no human assistance. They can be used to reliably predict any given function with reasonable approximation. That’s what makes the now familiar chatbots like ChatGPT work, they are essentially highly accurate prediction engines that construct sentences based on predicting the next word. But that’s not all.

Given enough data to analyze, these networks are highly trainable through unsupervised machine learning (ML) algorithms that simply instruct them to learn from their mistakes (oversimplification), through supervised learning (pairing inputs with expected outcomes), and ‘reinforcement learning with human feedback’ (RLHF). In the latter scenario, teams of human developers fine-tune the ML by providing an additional layer of human feedback to the process.

For decades the standard computing model, the Von Neumann model, has separated computational processing from I/O and memory. Based on instructions (programming), a central processing unit calls data from memory, acts on that data (processing), then outputs the results.

In a neural network model, data and processing coexist in the same digital space, no separation. In fact, neural networks were originally conceived to mimic the way the brain and nerves work in the human body. In fact, the latest development in AI, ‘transformers’, is loosely based on the way your optic nerves, behind your eyes, transmit images to your brain.

A digital neural network has three components, according to IBM they are: an input layer, with units representing the input fields; one or more hidden layers; and an output layer, with a unit or units representing the target field(s). The whole network is trainable, but it is in hidden layers where the magic happens.

Of some interest is the way natural biological models have influenced computer architecture and digital logic over the years. For example, the software engineer who devised the system that telecommunications transfer stations use to handle millions of calls at the same time was inspired by his observations of how ant colonies function. He came up with the rather simple but profound notion of releasing multiple solutions scenarios (ants) into a problem space (no food), then accelerating and amplifying the ones that worked the best (found food), abandoning all others. Ants do this by leaving characteristic ‘trails’ that give other ants information. When one ant finds food, it’s trail changes and attracts other ants. In the neural networking world, that same principal is called ‘gradient descent’ and along with ‘cost-factor optimization’ (which provides the system with feedback on its success vs failure ratios with an eye to maximizing success through iteration) is an integral part of how neural networks learn. But I digress…

With the evolution of the Internet, the age of ‘big data’ emerged. Suddenly, an interconnected global network spread — enabling the posting and sharing of massive datasets across multiple interconnected platforms. It is that data that AI uses for learning.

The process looks something like this: The AI system uses tools (agents) to acquire data for training purposes. It ‘crawls’ and ‘scrapes’ the web for data, accesses public and government datasets (like Census data), Wikipedia, and it accesses publications, articles and academic studies and papers — among others. Inside an organization, customer relations management (CRM) and enterprise resource planning (ERP) data is often collected in addition to standard spreadsheet operational and accounting data. To give you a sense of the size of data resources required to train a competent AI model, GPT-3 was trained on an amount of data pretty much equal to the contents of today’s entire web. Below is a chart of data types, which are further explained in the article containing the chart.

The training data collected are used as examples to teach the AI how to make predictions, glean insights and make decisions. The AI system learns context and develops understanding through sequential data analysis and at the same time it tracks relationships in the data. That gives the AI a very comprehensive and sophisticated basis for research, analysis, conversation, prediction, and carrying out instructions through the use of agents it has been trained to use.

To summarize: the development of neural networks provided a foundation for AI; the age of ‘big data’ provided the massive amounts of information necessary to train AI systems; and, the development of parallel processing architectures, sophisticated algorithms, fine-tuning techniques, prompt engineering and agents provided the tools necessary to leverage powerful Ai systems. For example, agents enable AI to activate external tools, like logging on to Twitter and posting a tweet; or activating a web-crawler to do research; or using VOIP to initiate a phone call to your partner.

That’s as non-technical an overview as I am capable of. As noted in my previous post on the subject, AI will most certainly change the world in dramatic fashion over time, reordering the labor market, replacing both blue- and white-collar workers, advancing scientific research, automating and roboticizing much of industry and retail, as well as impacting almost every job and academic endeavor. Obviously AI has plenty of potentially negative applications that many in the field are concerned about, like: sophisticated scams, misinformation, cyberattacks, militarization, AI-driven warfare, robot soldiers, and system miscalculations resulting from programming errors or oversights that allow an AI to run amok. Of course, the most concerning scenario involves AI reaching the ‘self-improving’ point (ASI), surpassing human intelligence, and deciding on its own that WE are the problem it needs to solve. I guess we’ll cross that bridge when we come to it, but some pre-planning for the moment is definitely in order.

This series of posts on AI are targeted to a generally non-technical audience. They first appeared as guest posts in a regional blog. In my next post, I’ll be breaking down ‘prompt engineering’ and AI business opportunities for those who are interested. #AI #ArtificialIntelligence #LLM #NeuralNetworks #ParallelProcessing #ChatBot #Chatbots #ChatGPT

--

--

Charlie McHenry

Co-founder of Trilobyte Games & Green Econometrics; founder of McHenry & Assoc.; former Oregon state telecom councilor; former RN. Thinker, writer, ally.