The future of computational linguistics
Our guest, Christopher Manning, is a computational linguist. He builds computer models that understand and generate language using math.
Words are the key component of human intelligence, he says, and why generative AI, like ChatGPT, has caused such a stir. At one time a language model could hardly produce one coherent sentence, and suddenly ChatGPT is composing five-paragraph stories and doing mathematical proofs in rhyming verse, Manning tells host Russ Altman in this episode of Stanford Engineering’s The Future of Everything podcast.
Transcripts
Russ Altman (00:03): This is Stanford Engineering's The Future of Everything, and I'm your host, Russ Altman. If you enjoy The Future of Everything, please follow or subscribe on your favorite listening app, you'll hear about new episodes and it'll help us grow. Today, Professor Christopher Manning, Chris Manning, will tell us how the intersection of linguistics and computer science has led to the remarkable progress in intelligent agents such as ChatGPT. It's the future of computational linguistics. Before we jump into this episode, a reminder and a plea, please rate and review the podcast. It'll help us improve and it'll spread the word, and then you'll know that the future of everything will never surprise you.
(00:48): When I think about linguistics, I think about the study of old languages and how they are related and how they teach us about culture of humanity. I don't always think about computer science and its role in these old languages. However, recently we have seen the rise of these intelligent chatbots like ChatGPT-4, which shows remarkable capabilities in understanding human language and generating responses to a variety of questions across all areas of human endeavor. Are these chatbots a surprise or were the computational linguistics experts who've been studying this field for decades totally expecting us to achieve this capability?
(01:33): Well, Professor Chris Manning is a professor of linguistics and computer science at Stanford University. He creates computational methods for studying linguistics and computational methods for having computers and humans interact. He will tell us that these capabilities were shocking even to the experts. Yes, there was some early work that anticipated progress, but what we've seen in the last year is something that nobody expected.
(02:00): So Chris, you're a professor of both linguistics and computer science. Now, some people might be surprised to know that that's even a thing because we think of linguistics as the study of languages, old languages, new languages, emerging languages, and we think of computer science as a very different study, but obviously they're not that different. So could you start out just telling us what is the intersection of linguistics and computer science and why do we care about it?
Christopher Manning (02:27): Well, yeah, so linguistics is a very diverse field. I mean, a lot of the time people think first of philology and reconstructing ancient tongues, but lots of other things go on. So there are sociolinguists who look how different communities speak and use language in quite different ways, for example. But for me, what I do is deal with how we can get computers to understand, generate, learn languages, which connects into cognitive science questions because that's also what variously psychologists would study from a more sort of human-centered cognitive perspective is how does this go about, where I'm on a slightly more technological level of wanting to get our computers so that they can understand us in the same way that other human beings do. So a lot of that has a more machine learning engineering flavor of how to build models, but it also centrally depends on the subject domain. So just as for your own work, Russ, if you're working in bioinformatics, it's useful to know something about biology as well as something about computation. Similarly, there's value in understanding what the structure of human languages is.
Russ Altman (03:45): Yes, and I'm sure we're going to talk about artificial intelligence and some of the amazing things that we're starting to see in a moment. But before that, I'd like to explore a little bit the relationship of language to intelligence. Do you think about it as there's an intelligent being and that intelligent being creates language to communicate, or is it much more rich connection between how we think about things and the words we have even to think about? So talk a little bit about the relationship between language and intelligence.
Christopher Manning (04:14): Sure. I should preface it by saying this is certainly an area that's been debated a lot by philosophers and cognitive scientists, and not everyone has exactly the same view, but my personal view is that language is extremely important to human intelligence. That if you compare humans with some of our nearest neighbors, chimpanzees, bonobos and things like that, it's kind of hard to differentiate us and some of the basics of intelligence it seems, that you can look at things like planning, tool, use, memory, a lot of the things that people talk about for intelligence and to first approximation, it doesn't seem like there's much difference. I mean, in some areas chimpanzees have better short-term memory than human beings do, in fact. But nevertheless, then it just seems like there's this night and day difference between chimpanzee's intelligence and human beings intelligence and what we've been able to do with that, right?
(05:23): There's a difference between having a stick to dig out some ants versus having a cell phone in your pocket, seems kind of different. And my belief is that the development of human language has been essential to leveling up human intelligence, that one side of human language is communication, and we can get back to that, but the other side is that human language gave this transformative tool for humans to think with. So it's not that can't think without language. You can think with images as people sometimes do in their dreams, and obviously you have immediate gut responses when you see something that aren't anything to do with language, you just look at it and you feel excited or repulsed, that's thinking without language. But we humans do a huge amount of structured thought, planning, consideration of alternatives in our head in a linguistic sense. I mean, all of us play out scenarios using words and thoughts in our head, and I think that has been essential in structuring and advancing human thought to allow the kind of higher level intelligence and the results of that that we see everywhere around us.
Russ Altman (06:44): Great, so you've set up my next question so perfectly. It's almost like you knew what was coming. So the elephant in the room, I think it's fair to say, is the recent release and discussions about ChatGPT and other so-called large language models or foundational models, and you've been working in this field for decades. And so, one of the things I wanted to ask you is, did ChatGPT for many people came as a shocking surprise, but I'm wondering from somebody who's been in the field, is this really a surprise or have we been making slow and steady progress for this over the last two decades and it's just the entirely predictable result of the research that you and others were doing?
Christopher Manning (07:26): It was a shocking surprise.
Russ Altman (07:28): Oh my goodness.
Christopher Manning (07:30): Yeah. So I mean, there are no doubt that you can paint out a history of the progress of research and that there were different steps along the way. And you can look back and say, well, people started using language models, these are models that sort of predict next words in the sequence around the sort of mid-seventies, and they started to show that they were useful for speech recognition and spelling correction, machine translation, and it was about 2000 that people started to use neural language models and that they showed some advantages. And then for some of the kind of architecture of these neural networks, you can pick out different components. So I mean, these current large language models are all used in neural models. It's called the transformer, that it has components inside it of residual connections and fully connected layers and attention layers, and you can point to all of the places that they came from from prior work.
(08:32): But despite that all until, let's say 2017, people realized that language models had an important role for fluency of texts, predicting what's likely in speech recognition, but no one thought this is going to be the way to achieve language understanding or to achieve the ability to generate whole passages of text, tell an entire story. It was sort of seen as for low level stuff of predicting a few words around each other. And so it was just very unexpected that this direction that emerged in 2018 that if you just make these neural language models very big, they just start to generate amazing capabilities.
(09:26): And at least so far for the trajectory that we've been in the last five years, as you make these models bigger and bigger, you just start to see more and more amazing capabilities emerging seemingly from nowhere. So people often talk about emergent capabilities, meaning that we're just building this bigger and bigger word prediction machine, and yet suddenly these models start having a lot of knowledge about the world knowledge about human languages, ability to do things like translate, summarize, et cetera. And I think everybody in the field hadn't expected that, and it was just surprising how this started to happen. And so then it's sort of been a goldmine because once you've found where the vein is, you keep digging as fast as you can in that direction.
Russ Altman (10:17): Yes, great. So from your perspective, and so I've seen reports with literally hundreds of pages of examples of remarkable things that this ChatGPT-4 in particular can do. Somebody asked it to do a mathematical proof but express the proof as a poem that rhymes, and it did that, it created a rhyming poem that made a mathematical proof. And a lot of these, you could say that they're parlor tricks, but I think that's doing probably injustice to the technology, from your perspective as a computational linguistic, and I know that this is a hard question, what are the one or two capabilities that you're most impressed by in these large language models that you've seen in the last few months?
Christopher Manning (11:00): Yeah. So I do think the sort of more kind of fun examples you see in the newspaper really do show the ability of these models to put stuff together in creative and clearly original ways. So a lot of them are fun, and I don't think of them as parlor tricks, but in terms of what seems to me special, I mean quite a bit of what I'm seeing as special in the most recent models is that these models do seem to be starting to develop the beginnings of a model of the world where they're maintaining the scenario in their head and can reason with it. So early on people used to say, "Oh, well, yeah, these models are very good at completing sentences and that they can tell a bit of a story and well by the time they could write a five paragraph story and it actually made sense, it kind of continued along coherently in a plausible way that was interesting to read with creative details."
(12:13): That, to me, was already amazing because not that long ago we thought we were doing well on natural language generation if we generated one reasonable sounding sentence, the idea that we could continue through 20 sentences in a five paragraph story and it would all follow from each other and make sense, that seemed a completely out of sight ability for natural language generation.
(12:41): But we are now getting more than that, right? That in the earliest smaller models that if you set up scenarios where the model had to maintain a good understanding of the world and be able to reason about it so that you've started with some facts about John knows some facts about their mortgage, and John's afraid to tell his wife because she will be concerned about X, and then John tells this other person. And so you're starting to put together this complex world model that the language model is also putting together some kind of model of the world. So you can then ask inferences based on that world model as to when the friend meets John's wife, what should they do and what concerns will they have? And the model can answer with the same kind of ability to reason about situations as a human being could.
Russ Altman (13:54): Of course, we can't go into the details of how these models are built and I don't want to, but they have seen incredible volumes of human generated text. And as you're saying, they seem to have learned a lot more than we would've expected about even human relationships and how they work and how unstated motives might play out in a scenario.
(14:16): Let's go to the topic, and this is I'm sure where you're getting your research agenda for the future, what are the things that it's not doing well or where are the things that we really need to focus attention as these things continue to be rolled out and really made available? I think I read that it's been the most quickly adopted technology in terms of the number of people who have signed up to use it either in their personal life or their professional life. So what should we be worried about? What is it not doing well that might not be in all the advertising material?
Christopher Manning (14:48): Fair enough. So perhaps the first thing that's been very represented in people saying be worried about these models is, I mean it's commonly called hallucination. I think that's sort of a bad term, but these models will just make stuff up. Now, there are some humans that just make stuff up, we've probably all-
Russ Altman (15:15): They're often in the news.
Christopher Manning (15:17): ... seen a few of them, yes, 2017 to 20 or something. But in general, human beings have a pretty good sense of what they know and in certain circumstances they'll tell a yarn, but most of the time they know what they know and will be reporting truthful things in that space. It's actually a kind of deep architectural fact about how these models are designed is that the whole goal of them is given this context, predict what's most likely to follow. And that means if they know facts that they will basically give those facts. But if they don't know facts, or you give them some kind of counterfactual scenario, they will, with equal confidence in the way they talk, just put whatever. So maybe the model doesn't know much about your educational background, Russ, so it'll say, okay, Stanford professor, well maybe that means you got your PhD from MIT say, and well, maybe you had your first job at Columbia and then you moved to Stanford, it'll just write this biography of you-
Russ Altman (16:28): Just for the record, none of that is true.
Christopher Manning (16:30): ... as if it were all true, completely straight-faced, but it's just making stuff up. So that's one huge problem to solve. I think we don't currently have a very good idea as to how to solve that. Engineers are clever at refining things and improving metrics, so I think we'll certainly see the amount of stuff made up starting to decrease as people do iterate on these models. But it's very central to the current architecture that these models just in any circumstance put next what's most plausible. So I think we do actually need some more profound architectural advances. I mean, one of the problems with these models is that they are essentially just feed forward models. So at any point they're running stuff through a neural network, generating the next word, and repeating over. And so something people are starting to experiment on is reflective models where stuff feeds back into itself again.
(17:35): And that might give it much more ability to deal with some of this invention. I mean, the kind of interesting fact is if you take one of these great models like ChatGPT or GPT-4, and you show it something that just generated and you then just ask it, how sure are you that this is true? It's actually pretty good at answering that, right? It can often differentiate what it just invented versus stuff that is true. And that's because really internal to its probabilities from its trading data, it does have more idea as to what things are well-supported versus what things are made up. So we kind of need to have an architecture where it can be more using that knowledge that it even has at the time it's generating to differentiate things. And so that leads on to this kind of concept mentioned before of world models.
(18:34): So human beings internal to their head have a model of the world. That's exactly what we play out when I say, "Okay, I have to go and talk to my department chair saying I want to go on leave again, gee, they're probably going to react I was on leave two years ago, and so I'd better come up with a good explanation of why it'd be okay for me to disappear for another six months." We have a model of the world and of people in the world, and we can sort of use that to plan out and put things together in all sorts of ways.
(19:09): Now these large language models, on the one hand are starting to develop models of the world, we mentioned that before. And in fact, basically they're now the best computational models of the world that we have. So people in robotics are starting to use large language models as world models because they actually help them predict what actions that will happen in the world with different objects and different people. But our current models still have very poor world models. And so somehow working out how to have better world models and maintenance of world models in these models. So there's still lots of research directions, we shouldn't give up quite yet.
Russ Altman (19:52): Great. Well, this is The Future of Everything with Russ Altman. We'll have more with Chris Manning next.
(19:57): Welcome back to The Future of Everything. I'm Russ Altman and I'm speaking with Professor Chris Manning of Stanford University. In the last segment, Chris told us about the complex interaction between intelligence, language, and computer science. He told us that ChatGPT-4 was kind of a huge surprise even for the experts. In this segment he will tell us about some of the risks of this technology, and we'll also go back to translation and talk about what is our current capability for translating between languages, and how are we doing for the old languages that are in many cases disappearing because their speakers are learning other languages instead.
(20:36): I want to start out in this segment just asking you about what you conceive of as the major risks of these technologies. You talked about what the technical challenges to making them better are, separate from that, what could go wrong in our use of these technologies?
Christopher Manning (20:52): Yeah. So there are lots of risks and lots of things that could go wrong, some of them are very immediate and direct. So these models provide a very cheap way to produce large amounts of text and they can be fine-tuned to produce texts, texts that works the best to influence people. So the industry of advertising has been involved for close to a century now, I guess, with writing texts that influences people to buy certain products or to vote for a certain person. But that's been, relatively speaking, expensive work for you to pay to get people to do. And we are facing the chance now that we'll be able to have these models do that kind of work, not only about two orders of magnitude cheaper, a hundred times cheaper, but actually much better because using our machine learning technology, we can continue to tune these models and have individualized models. So there's a model that's especially good at persuading Russ Altman who to vote for in the next election.
(22:09): And already there are sort of problems with people being too influenced by both advertising and populist opinion. And if that has made much worse, that's potentially quite bad for society.
Russ Altman (22:26): Wow.
Christopher Manning (22:26): Yeah, there are other risks as well. I mean some of the other ... so there's influencing people, there are other forms of that. There's disinformation coming, whether from state actors or large companies wanting to influence public opinion, but there's then also concerns that come from biases that might be built into these models. So the fact is these models are dominated by the people who tend to have power because they're by and large the people who write the most and gets disseminated the most. So they're not equally representing all the voices of humanity.
(23:08): And so that is then a bad source of bias, which can further drive things away from the direction of trying to give us equality in society. And that's especially worrying because, at least on the trajectory we are on at the moment, there is a small number of large language models that are dominating the scenes, such as the ones from OpenAI. And so if every ... well to first approximation, everybody is using the same models and they've got particular biases in favor of certain kinds of people and against other kinds of people, well, it's sort of bad if you aren't on the right side of that equation.
Russ Altman (23:51): Yes. And it sounds to me this is not a purely technical challenge for folks like you and your research colleagues. It sounds like society is going to have to make decisions about what is and isn't allowed, and if there are any kinds of cones that they want to set up for the behavior of these models, and I'm sure that's a difficult discussion. Is it happening?
Christopher Manning (24:10): It's starting to happen, but I think it's just not happening with the speed and the level of focus on what are quite difficult technical issues that's really needed. Yeah, I think there's just no doubt that we need to be doing more to think about the consequences and regulate what kinds of things are and aren't okay. But certainly in the United States, we're in this situation where most of Congress barely understands how current generation social media works, let alone having the kind of background for sensibly thinking about how to regulate and control these models.
Russ Altman (24:54): Great. Well, not great, but thank you for those comments. Let me move to an entirely different area, and actually going back to the roots of our discussion in linguistics and computation, I know that one of the big challenges for computational linguistics for many years was translation, translating from English to Spanish, from French to Russian. So let me ask, and it looks like that these models are quite good at translation. As a linguist, what are you seeing there? Is this a solved problem, and at the edges, are there still issues that we have to pay attention to?
Christopher Manning (25:27): It's definitely not a solved problem, but enormous progress is being made. So I mean, these models are just sort of trained on a lot of text of various languages. So the fact of the matter is just out of the box, you can ask ChatGPT to translate between languages and it does a pretty passable job. It's still the case that people build dedicated neural machine translation models explicitly trained on text to translate, and they're even better. And so we've reached the point for major languages that it's not that everything is always perfect, but translation is just good now, I mean, from those of us at least been around for a fair while and remember the kind of garbage you used to get out of machine translation. I mean, now you can take a paragraph in German, Italian, French, Spanish, stick it into Google Translate and read the English translation, and for first approximation it'll just be perfect. Some other languages like Chinese is still a bit harder.
(26:32): So there's a very good news story. The question is how that extends out to the whole of humanity. So people normally count about 7,000 languages in the world. I mean, a lot of those are unfortunately languages that aren't going to be with us much longer. So, for example, lots of languages in different areas of the planet, including Native American languages and Australian languages, they're down to a handful of people. And although some people are working hard to preserve and reclaim those languages, it seems like there's just no doubt that within a century's time, the number of languages will be down to, well, at least 2000, maybe only 1000.
Russ Altman (27:21): Wow, so this is a huge contraction.
Christopher Manning (27:24): So there's going to be a huge contraction in the number of languages spoken in the world, but even if we stick to one or 2000, the fact of the matter is there are only fantastic amounts of data that allow building really good foundation models like we've been talking about for maybe the top 20. I mean, even within the top 20, there's an enormous difference between the amount of data you can collect in English or Chinese versus the amount of data that you can collect in Bangla or Portuguese. And so then things fall off very rapidly get from there. And so there's a real haves and have-nots of this new technology. So it's a mixed story, there's a good news story and a bad news story.
(28:13): The good news story is these models are much better at transferring capabilities across languages than anything we had before. So we're actually sort of making progress and we're also better able to handle smaller languages than we used to be. But that's a relative claim, it's still the case that there are lots of languages with millions of speakers for which we just don't have good language technology, and there's no easy way to make it even in the current world because we just don't have the kind of data resources to be able to do so.
(28:53): And that reflects things like the legacy of colonialism so that you see in much of Africa that there are major languages in Africa, which will have in 10 million plus speakers, lots of people, so that those are larger languages than European languages like Danish or Norwegian. But the fact of the matter is that the educational system in those countries that people are still being schooled in English or French. So these are sort of languages of the community, and because of that, there just isn't the available resources in terms of written materials, et cetera, to be able to build the same kind of language technology.
Russ Altman (29:37): So in the last 30, 40 seconds, I did want to ask, is there any way that these language models might help this problem? I understand all the ways in which social forces have led to a loss of languages, but is it possible that they will have such a deep understanding of how human language works, that with relatively small bits of language from these dying linguistic traditions that we might be able to regenerate the information because we understand so much about language that, just given one book or one piece of text, that they'll be able to be ... I'm thinking of Jurassic Park, in Jurassic Park they get a little bit of DNA from the T-Rex, and they're able to regenerate the T-Rex. Is there going to be anything like that that you see on the horizon for forensic linguistics?
Christopher Manning (30:31): Yeah, I think so. And it is already the case that neural models have been applied to deciphering and decoding languages. I mean, I think you can only go so far, right? So you're more expert on DNA than me, but if you have the DNA, that actually is giving you the whole blueprint, so you only need a few strands of DNA and if you've got good enough science to decode it all, you've got everything in some sense. Whereas if you've only got one book or some short stories, you just don't have nearly enough of the language, you just don't know what other words are.
(31:09): So anything that you do in those circumstances can try and be faithful to what you do know and exploit that, but also has to invent stuff. And that's what's happened in human cases as well, so that when Hebrew was in revive for Modern Hebrew, well, certainly it was following what was in Ancient Hebrew, but while people needed a lot of words for stuff that they didn't have words for, and so a lot of new stuff had to be invented, and more extremely the same has been done in some cases of reviving things like Australian Aboriginal languages, which there's only quite partial historical documentation. So you're trying to follow the genius of the language, but you have to be filling, coloring in stuff that is sort of plausible but can't actually claim its fact.
Russ Altman (32:05): Thanks to Chris Manning, that was the future of computational linguistics. You have been listening to the Future of Everything with Russ Altman. If you enjoy the podcast, please consider subscribing or following it on your favorite app, you'll never be surprised by the future of anything. Maybe tell your friends about it too. Also, of course, rate and review it. It will help us grow and it will help us improve. We have more than 200 episodes in our archives from interviews with people who are inventing the future. Consider checking those out as well. You can connect with me on Twitter @rbaltman, and you can follow Stanford Engineering @stanfordeng.