Skip to content Skip to navigation

Research & Ideas

Search this site

A new AI camera recognizes objects faster and more efficiently

By combining two types of computers, this system could speed up an autonomous car’s ability to recognize the world around it.

In real-world experiments, the system successfully identified airplanes, automobiles, cats and dogs. | Getty Images/Andrey Suslov

The image recognition technology used in today’s autonomous cars and aerial drones as well as tomorrow’s cancer-seeking robotic medical devices, all depend on artificial intelligence.

These “computers that see” teach themselves to recognize objects — a dog, a pedestrian crossing the street, a stopped car or a cancer tumor.

Now, researchers at Stanford University have devised a new type of camera system that can classify images faster and more energy efficiently, and that could one day be built small enough to be embedded in the devices themselves, something that is not possible today.

“That autonomous car you just passed has a relatively huge, relatively slow, energy intensive computer in its trunk,” says Gordon Wetzstein, an assistant professor of electrical engineering and (by courtesy) computer science at Stanford, who directed the research.

Wetzstein and Julie Chang, a doctoral candidate in his lab and first author on the paper, have married two types of computers into one — creating a hybrid optical-electrical computer designed specifically for image analysis. The paper was published in the journal Nature Scientific Reports.

Consumed by computation

The first layer of the researcher’s prototype camera can be thought of as an optical computer. Optical computers do not require the power-intensive mathematics of digital computing. The second layer is a traditional digital electronic computer

This optical computer operates by physically preprocessing image data, filtering it in multiple ways that an electronic computer would otherwise have to do mathematically. Since the filtering happens naturally as light passes through the custom optics, this layer operates with zero input power. This saves the hybrid system a lot of time and energy that would otherwise be consumed by computation.

“We’ve outsourced the math into the optics,” Chang says.

Think of it as a camera that takes multiple images of the same scene, as if each variation was taken through a specially designed filter. The images are captured optically, just like a photograph on film. Each image captured in that instant would have to be extracted mathematically with electronic computing. The result is profoundly fewer calculations, fewer calls to memory and far less time to complete. Having leapfrogged these preprocessing steps, the remaining analysis proceeds electronically with a considerable head start.

“Millions of calculations are circumvented and it all happens at the speed of light,” Wetzstein says.

Rapid decision-making

In speed and accuracy, the prototype rivals existing electronic-only computing processors that are programmed to perform the same calculations, but it provides substantial computational cost savings.

While their current prototype, arranged on a lab bench, would hardly be classified as small, the researchers are confident their system can be dramatically miniaturized to the point that one day they could be embedded within the form factor of a handheld video camera or an aerial drone.

In both simulations and real-world experiments, the team used their system to successfully identify airplanes, automobiles, cats, dogs and more within natural image settings.

“Some future version of our system would be especially useful in rapid decision-making applications, like autonomous vehicles,” Wetzstein says.

In that respect, Wetzstein, Chang and their cohort at the Stanford Computational Imaging Lab are busy developing the next generation of their design. They are looking at ways to make the optical component do even more of the preprocessing, something Wetzstein describes as making it “more expressive.” And, of course there is work to be done shrinking the scale.

Other contributing authors include doctoral candidate Vincent Sitzmann, as well as Xiong Dun and Wolfgang Heidrich at King Abdullah University of Science and Technology, Saudi Arabia.

This work was supported by a grant from a National Science Foundation Graduate Research Fellowship, a Stanford Graduate Fellowship, a Sloan Research Fellowship and an NSF CAREER award, and generous support from the KAUST Office of Sponsored Research.