Smartphones impress with their AI algorithms that can understand spoken words.
Though they mostly get things right, users must often retype words that have been transcribed incorrectly. Little known to those users, their corrections to AI’s mistakes get whisked away to cloud-based machine learning systems that train themselves to recognize those mistakes in the future and then send back updates to the phone. Over time, the phone grows smarter, getting fewer words wrong by the day.
In such scenarios, chip designers traditionally must grapple with two overarching tasks: The first is inference, in which the algorithm grapples with accents, mispronunciations, background noise and other impediments to infer what the user’s spoken words mean; and the second is training, in which the computer in essence teaches itself to make fewer mistakes. Most often in today’s smartphones, largely for energy efficiency, the inference part happens locally on the phone itself, while training gets handled remotely in the cloud.
Not surprisingly, this give-and-take can causes long delays, explained Priyanka Raina, assistant professor of electrical engineering and senior author of a recent paper that proposes to close the gap. Raina was joined by first author Massimo Giordano, a graduate student, and 16 other collaborators.
“We eliminate that time lag by doing both inference and training on the phone itself using what we call an ‘AI-at-the-edge’ chip. The data never leaves the device where it is gathered,” Raina said.
AI chips that learn at the edge promise many benefits to smartphone users, including speedier learning and longer battery life due to lower energy demand, said Subhasish Mitra, professor of electrical engineering and of computer science, a senior member of the research team. But AI at the edge also offers at least one other important upside that may go underappreciated by most users: “Privacy is improved, because hackers aren’t able to intercept sensitive data as it is being uploaded and downloaded,” Mitra said.
The Stanford team developed its AI-at-the-edge chip using RRAM, a memory technology that labs have struggled to make useful. Importantly, RRAM memory is “non-volatile” – it retains stored information even when power to the chip is turned off, similar to the way flash memory does in smartphones and computers today. But RRAM enjoys a few key advantages over flash memory: It can be engineered to consume less power, and the writing and reading of data are considerably faster within the local environment of the RRAM chip.
To make RRAM useful, however, the team had to solve a technical conundrum: RRAM’s overall advantage in energy efficiency is achieved largely on the data-reading side of the power equation, where its energy is considered negligible. On the data-writing side, however, it uses a much larger amount of energy. The engineers had to balance those concerns to create a chip that overall uses less energy than existing technologies.
To explain, Boris Murmann, professor of electrical engineering and a senior author of the paper, draws an analogy with the ancient form of writing, cuneiform, in which wedge-shaped marks are pressed into clay.
“Making the indents is difficult; reading them is a cinch,” Murmann said of the energy equation.
To compensate, the researchers developed algorithms and hardware fixes that minimize how often the on-chip machine learning system writes data when learning new patterns. The result is a chip that is both a fast learner and a prolific storehouse for data, all while consuming minimal battery power. In the end, it may be enough to make on-chip AI practical in a place that is off limits today – at the edge of the internet.
Perhaps best of all, Mitra added, the team’s new AI-at-the-edge chip isn’t a one-off prototype that may face challenges in scaling up to commercial production. It can already be made in volume in one of the world’s leading semiconductor foundries, which provided the Stanford team access to its technology approach as a proof-of-concept effort to demonstrate what RRAM can really do.
“We flipped the design paradigm on its head,” Mitra said. “Usually, you have an algorithm and you design a new chip specifically for that task. In this case, we revised the algorithm to work within the constraints of an existing hardware.”
Because AI-at-the-edge chips are already real, Mitra said, companies could start integrating them in their devices essentially immediately.
Additional co-authors include undergrad students John W. Kustin, Victor Turbiner; graduate students Kartik Prabhu*, Kalhan Koul*, Robert M. Radway*, Albert Gural*, Rohan Doshi*, Zainab F. Khan, Gregorio B. Lopes, Timothy Liu, postdoctoral researcher Guénolé Lallement*, and researchers from TSMC also contributed to the paper (* equal contribution).