AI: The pattern is not in the data, it is in the machine

b21c7d8b-5465-4ff6-ad1e-a3aa0de5af4e.jpg

A neural network transforms input, the circles on the left, to output, on the right. How that happens is a transformation of weights, center, which we often confuse with patterns in the data itself.

Tiernan Ray for ZDNET

It is a commonplace of artificial intelligence to say that machine learning, which relies on huge amounts of data, functions by finding patterns in data.

In fact, the phrase “finding patterns in data” has been a staple of things like data mining and knowledge discovery for years, and machine learning, and especially its deep learning variant, has been believed to continue the tradition of finding such patterns.

AI programs do indeed result in patterns, but just as the fault, dear Brutus, lies not in our stars but in ourselves, the fact of those patterns is not something in the data, it is what the AI ​​program is in the data.

Almost all machine learning models operate through a learning rule that changes the program’s so-called weights, known as parameters, as the program gets samples of data and possibly labels associated with that data. It is the value of the weights that counts as ‘knowing’ or ‘understanding’.

The pattern that is found is actually a pattern of how weights change. The weights simulate how real neurons are believed to “fire,” the principle formed by psychologist Donald O. Hebb, who became known as Hebbian learning, the idea that “neurons that fire together are interconnected.”

Also: AI in 60 seconds

It is the pattern of weight changes that is the model for machine learning learning and understanding, something the founders of deep learning emphasized. As expressed nearly forty years ago, James McClelland, David Rumelhart, and Geoffrey Hinton wrote in one of the foundational texts of deep learning, Parallel Distributed Processing, Volume I:

What is saved is the connection strengths between units that these patterns can be created with […] If the knowledge is the strengths of the connections, then learning should be a matter of finding the right connection strengths so that the right activation patterns are produced under the right conditions.

McClelland, Rumelhart, and Hinton wrote for a select audience, cognitive psychologists and computer scientists, and they wrote in a very different era, an era when people didn’t readily assume that everything a computer did represented “knowledge.” They were working at a time when AI programs couldn’t do much at all, and they were mostly doing a calculation, any calculation, of a fairly limited array of transistors.

Then, about 16 years ago with the emergence of powerful GPU chips, computers began to exhibit some really interesting behavior, capped by the groundbreaking ImageNet performance of Hinton’s work with his graduate students in 2012, which marked the maturation of deep learning. .

As a result of the new computing achievements, the popular mind started to build all kinds of mythology around AI and deep learning. There was a flood of really bad headlines comparing the technology to superhuman performance.

Also: Why is AI reporting so bad?

The current understanding of AI has obscured what McClelland, Rumelhart and Hinton focused on, namely the machine, and how it “creates” patterns, as they called it. They were very familiar with the mechanics of weights constructing a pattern as a response to what in the input was just data.

Why does all this matter? If the machine is the maker of patterns, then the conclusions people draw about AI are probably mostly wrong. Most people assume that a computer program perceives a pattern in the world, which can lead people to defer judgment to the machine. If it produces results, it is thought, the computer must be seeing something that people don’t.

Except that a machine that constructs patterns doesn’t explicitly see anything. It constructs a pattern. That means that what is ‘seen’ or ‘known’ is not the same as the mundane, everyday sense in which people speak of themselves as knowing things.

Instead of starting from the anthropocentric question: What does the machine know? it’s best to start with a more precise question: what does this program represent in the connections of its weights?

Depending on the task, the answer to that question can take many forms.

Think computer vision. The convolutional neural network underlying machine learning programs for image recognition and other visual perception consists of a collection of weights that measure pixel values ​​in a digital image.

The pixel grid is already an imposition of a 2D coordinate system on the real world. Provided with the machine-friendly abstraction of the coordinate grid, the representational task of a neural net comes down to matching the strength of sets of pixels to an imposed label, such as ‘bird’ or ‘blue jay’.

In a scene with a bird, or specifically a blue jay, many things can happen, including clouds, sunshine, and passersby. But the scene as a whole is not the thing. What matters to the program is the set of pixels most likely to yield a suitable label. In other words, the pattern is a reducing act of focus and selection inherent in the activation of neural network connections.

You could say that such a program does not so much “see” or “perceive” as it filters.

Also: A New Experiment: Does AI Really Know Cats or Dogs — or Something Like That?

The same goes for games, where AI has mastered chess and poker. In the full informational chess game, for DeepMind’s AlphaZero program, the machine learning task boils down to making a probability score at any given time of how much a potential next move will ultimately lead to a win, lose or draw.

Since the number of possible future game board configurations cannot be calculated by even the fastest computers, the computer’s weights shorten the move search by doing what you might call summarizing. The program summarizes the probability of success if one were to pursue multiple moves in one direction, and then compares that summary with the summary of possible moves that could be taken in another direction.

While the state of the board at any given time — the position of the pieces and what pieces are left — may “mean” anything to a human chess grandmaster, it’s not clear that the term “average” has any meaning to DeepMind’s AlphaZero for such’ one summary task .

A similar summary task is achieved for the Pluribus program that conquered the most difficult form of poker in 2019, No-limit Texas Hold’em. That game is even more complicated because it contains hidden information, the players’ face down cards, and additional “stochastic” elements of bluffing. But the display is, again, a summary of the probabilities at each turn.

Even in human language, what’s in the weights is different from what the casual observer would suspect. OpenAI’s flagship language program GPT-3 can produce strikingly human-like output in sentences and paragraphs.

Does the program know the language? The weights represent the probability of how individual words and even entire strings of text are found in order with other words and strings.

You could call that function of a neural net a summary similar to that of AlphaGo or Pluribus, since the problem is very similar to chess or poker. But the possible states that can be represented as connections in the neural net are not only enormous, they are infinite given the infinite compatibility of language.

On the other hand, since the output of a proofing tool such as GPT-3, a sentence, is a vague answer rather than a discrete score, the “correct answer” is somewhat less demanding than winning, losing or drawing chess or poker. . You could also call this feature of GPT-3 and similar programs an “indexing” or an inventory” of things by their weight.

Also: What is GPT-3? Everything your business needs to know about OpenAI’s groundbreaking AI language program

Do people have a similar kind of inventory or language index? There doesn’t seem to be any evidence for it in neuroscience so far. Likewise, in the expression to distinguish the dancer from the dance, does GPT-3 recognize the multiple levels of significance in the sentence or the associations? It is not clear that such a question even has any meaning in the context of a computer program.

In each of these cases – chessboard, cards, word sequences – the data is as it is: an old-fashioned substrate divided in various ways, a series of plastic rectangular paper products, a cluster of sounds or shapes. Whether such inventions collectively mean anything to the computer is just one way of saying that a computer is tuned in response, for a purpose.

The things that such data calls into the machine — filters, summaries, indices, inventories, or however you want to characterize those representations — are never the thing in itself. They are inventions.

Also: DeepMind: Why is AI so good at language? It’s something in the language itself

But, you might say, people see snowflakes and see their differences, and catalog those differences too, if they feel like it. It is true that human activity has always tried to find patterns in different ways. Direct observation is one of the simplest means, and in a sense what is done in a neural network is kind of an extension of it.

You could say that the neural network is gaping at what has been true in human activity for millennia, which is that speaking patterns is something that is imposed on the world rather than something in the world. In the world, snowflakes have shape, but that shape is just a pattern for one who collects, indexes, and categorizes them. In other words, it is a construction.

The activity of creating patterns will increase dramatically as more and more programs are turned to the world’s data and their weights are tuned to form connections that we hope will create useful representations. Such representations can be incredibly helpful. One day they can cure cancer. However, it is helpful to remember that the patterns they reveal are not in the world, but in the eye of the beholder.

Also: DeepMind’s ‘Gato’ is mediocre, so why did they build it?

Leave a Comment