Laying the foundation for today’s generative AI
Christopher Manning, professor of linguistics and of computer science, co-founder of Stanford's Institute for Human-Centered Artificial Intelligence (HAI), and recipient of the 2024 IEEE John von Neumann Medal, remembers the moment he knew he wanted to study language.
“One day in high school English class, I came across one of my teacher’s personal books that dealt with linguistics and the structure of human languages,” he says. “I began reading it, and found out about the International Phonetic Alphabet, which provides a common set of symbols to represent the pronunciation of sounds in any language. At the time, I’d spent many hours learning the spelling of English words – many of which were arbitrary and strange – for spelling tests, and I’d also studied some French and Latin. This was the first thing I saw that captured a guiding idea of linguistics, that there is something useful to be achieved by studying human languages in general and trying to produce a common science across all human languages. It was the reason I first began studying linguistics as an undergrad.”
Four decades later, Manning’s ongoing fascination with human language – and his pioneering efforts to help computers learn, understand, and generate that language – have made him a renowned and ground-breaking figure in the fields of natural language processing (NLP) and machine learning.
“I would call Chris an enormously influential figure – possibly the single most influential figure – in natural language processing,” says Dan Jurafsky, Stanford professor of linguistics and of computer science. “He’s by far the most cited person in the field, and his decades of research have influenced everything, including our most recent models. Every academic in natural language processing knows his work.”
Envisioning a machine learning shift
Manning was born in Bundaberg, Queensland, Australia, where his father worked maintaining, designing, and building machinery at the Fairymead Sugar Plantation. By the time Manning was in high school, the family had relocated to the national capital of Canberra, where he got his first computers – first the loan of a TRS-80 and, eventually, a Commodore Amiga. In the mid-80s, as an undergraduate studying linguistics, computer science, and math at the Australian National University (ANU), Manning was already excited about the intersection of those fields, and becoming convinced that the early NLP era of handwritten lexicons and grammar rules was coming to a close.
“I was beginning to believe, as I have ever since, that what we needed to be doing was to find a way to get computers to learn things, so that rather than handwrite out grammars and rules and lexicons for them, we get them to learn from language data,” he says. “Eventually it seemed like I should try to learn more about this computational linguistics/natural language processing stuff, and at that time, the U.S. was the place to go.”
Learning by doing
After a short stint teaching English in Japan, Manning took the advice of ANU linguistics mentor Avery Andrews, who suggested applying to Stanford, even though the university didn’t offer a program in natural language processing at that time. To get around that, Manning enrolled as a PhD student in linguistics – studying human language syntax – and began simultaneously working at nearby Xerox PARC, where he learned about computational linguistics and worked alongside a group of researchers who were beginning to do statistical NLP using digital text, which was just starting to become available.
“This was before the World Wide Web, but you started to be able to get things like newspaper articles, parliamentary proceedings, and legal materials, where you could find a couple million words of text,” Manning says. “Computer centers would write this sort of data onto 10.5-inch tapes, which would then be physically shipped to their customers. Companies working on computational linguistics, like Xerox, IBM, and AT&T, could purchase these tapes from news organizations, for example, or get access to them from their business clients who let them use the data. It was really exciting, because it meant that for the first time, we could start to do linguistics by actually having large amounts of text data we could search for patterns to try and learn automatically what the structure of human language was.”
During this time, Manning was also intrigued by – and saw the potential of – new work that had begun in the late 1980s on probabilistic machine learning models. Essential components in today’s machine learning, these statistical models take into consideration the inherent uncertainty in real world data, and incorporate it into their predictions, allowing for a more accurate understanding of complex systems.
“I think the key to my success overall has been a willingness to quickly get into major new approaches that I believed were going to be successful,” he says. “I wasn’t the first person to see the potential of learning from lots of text data and building these probabilistic models of language, but I was involved early in my career, and I think that helped get me to where I am today.”
Critical early work
Following his PhD, Manning became the first faculty member to teach statistical NLP at Carnegie Mellon University, before opting after two years to return to Australia with his wife, Jane, to teach linguistics at the University of Sydney. By 1999, however, he was back at Stanford as an assistant professor with a joint appointment in linguistics and computer science. By 2010, artificial neural networks – which had been actively explored in the mid-1980s – were again rising to prominence, and Manning again embraced the promise of a new technology.
“I advocated strongly for the idea that we could use these neural networks in natural language processing to understand sentences, their structure, and their meaning,” he says. “My students and I really pushed for that, and that ended up being key in the development and use of these neural networks for natural language understanding."
“We started working seriously with these networks to model language and began building systems that could solve language understanding problems, such as determining whether what someone was saying was positive or negative,” Manning says. “I ended up doing quite a lot of the early work on using neural network approaches for learning human languages, which involved getting these models to understand, produce, and translate language.”
Manning’s 2010s work on representing words as vectors of real numbers and modeling relationships between words with a simple attention function led to the type of large language models that are in use today, like ChatGPT. His contributions are immense, says Percy Liang, Stanford professor of computer science.
“Today it’s obvious that we should be using deep learning in NLP, but there was fierce resistance to the idea in the early 2010s,” says Liang. “Chris did important early work showing that deep learning could work better than previous machine learning models which required a lot of feature engineering. This eventually led to the development of the modern NLP systems that we take for granted today. Chris had the foresight to think about how it would eventually be transformative.”
Creating accessible NLP software
Manning’s other significant contributions to date include a series of textbooks that helped define the field of computational linguistics; the online CS224N video course on YouTube; a framework to provide consistent annotation of grammar across different languages called Universal Dependencies; ongoing and essential research to understand the role of linguistic structure in language processing; and an early commitment to make NLP software accessible to all.
“Now it’s common for someone to simply go to the web, download a piece of software, and build a neural network,” Jurafsky says. “That wasn’t the norm 20 or 30 years ago. Chris and his lab were building publicly accessible libraries of NLP software and putting that online decades before everybody else, and consistently pushing for that to be the way of the world. Today the idea of open-source NLP software is the norm.”
For now, says Manning, he will continue working to create deep learning models that have a richer understanding of both the world and its many languages.
“For me, human language is an amazing thing that we still don’t really understand,” he says. “It’s astounding that babies somehow figure it out, and that little kids eventually learn to be good language users from maybe 50 million words of human language, while we show the best large language models trillions of words. Somehow, humans are still smarter. It’s a fascinating question and building computer models seems like a productive window into starting to think about that.”