Research shows AI models don’t match human visual processing

A new study from York University shows that deep convolutional neural networks (DCNNs) don’t match human visual processing using shape perception. This could have serious and dangerous implications for AI applications, according to Professor James Elder, co-author of the study.

The new study titled “Deep learning models fail to capture the configuration of human shape perception” was published in the Cell Press journal iScience.

It was a joint study by Elder, who holds the York Research Chair in Human and Computer Vision, as well as the position of co-director of York’s Center for AI & Society, and Professor Nicholas Baker, an assistant professor of psychology and former VISTA postdoctoral researcher in York. .

New visual stimuli “Frankensteins”

The team relied on novel visual stimuli called “Frankensteins,” which allowed them to examine how both the human brain and DCNN’s process holistic, configuration-object properties.

“Frankensteins are just objects that have been taken apart and put back together the wrong way,” Elder says. “As a result, they have all the right local characteristics, but in the wrong places.”

The study found that DCNNs are not confused by Frankensteins like the human visual system. This reveals an insensitivity to configuration object properties.

“Our results explain why deep AI models fail under certain circumstances and point to the need to consider tasks beyond object recognition to understand visual processing in the brain,” Elder continues. “These deep models tend to take ‘shortcuts’ when solving complex recognition tasks. While these shortcuts may work in many cases, they could be dangerous in some of the real AI applications we are currently working on with our industry and government partners.”

Image: York University

Realistic implications

Elder says one of these applications is traffic video surveillance systems.

“The objects in a busy traffic scene – the vehicles, bicycles and pedestrians – obstruct each other and come to a driver’s eye as a jumble of disconnected fragments,” he says. “The brain has to group those fragments correctly to identify the correct categories and locations of the objects. An AI road safety monitoring system that can only perceive the fragments individually will fail at this task and potentially misunderstand the risks to vulnerable road users.”

The researchers also say that adjustments to training and architecture aimed at making networks more brain-like have failed to deliver configuration processing. None of the networks could accurately predict the rating of human objects.

“We speculate that to match human configuration sensitivity, networks need to be trained to solve a wider range of object tasks beyond category recognition,” Elder concludes.

Leave a Comment