Computer programs that purport to help humans learn have been around almost as long as there have been computer programs, but their track record for success has been less than impressive.
Emma Brunskill, an expert on artificial intelligence and machine learning, thinks that less-than-stellar record is about to change and has dedicated her career to finding new and better ways to teach computers to teach humans. Her research creates innovative “reinforcement learning” algorithms in which computers learn through experience to get better at teaching humans. In the process, the algorithms lead people to make better, more-informed decisions that produce better outcomes in the long run.
To Brunskill this is no schoolroom affair, but an endeavor where the stakes are high. She says that better education is key to big societal challenges, like alleviating poverty. She believes that better training of new workers — or retraining of older ones — can yield better paying jobs for more people. What’s more, she’s turning her attention to other fields, namely healthcare, where better decisions can have life-or-death implications.
Join host Russ Altman and Stanford computer scientist Emma Brunskill for a deep exploration of the new age of computer-assisted learning and decision-making. You can listen to The Future of Everything on Sirius XM Insight Channel 121, iTunes, Google Play, SoundCloud, Spotify, Stitcher or via Stanford Engineering Magazine.
Russ Altman: Today on The Future of Everything, the future of reinforcement learning. So, we humans are pretty interested in learning and we’re pretty good at learning. If you ask somebody why they enjoy their job or their hobby, it is very common for them to say something like, “I love this activity because I’m constantly learning, new and interesting things.” Learning itself is the reward.
Now, with the rise of artificial intelligence, AI, our computers are also getting better at learning. But it doesn’t come nearly as naturally to computers as it does to us. So computer scientists are working pretty hard to get computers to learn how to do tasks. Recognizing a face, for example, requires a unbelievably huge amount of data and breakthroughs in image understanding, really only occurred when tens of millions of images were available, to teach computers about human faces, kittens, and fire trucks.
Now, I have 15 year old grandson and a 15 month old, I have a 15 month old grandson. And in the last 15 months, he has gone from a bundle of sleeping and eating and not much else, to walking, interacting with the world, almost talking, and definitely he’s a learning a machine. And he has learned all this in 15 months of being a human. He has learned that vacuum cleaners look benign when they’re turned off, but are terrifying when they’re turned on. He has learned that there are certain things that mommy doesn’t want him to touch, and daddy. And he has learned that there’s some food can be good, but it’s not always good. Can we get computers to learn experientially like this, so they can develop capabilities that help humans live better lives.
There are many areas of learning for AI, but one of the them is called reinforcement learning. Learning from experience to make good decisions. And I guess it’s called reinforcement learning because there is usually some reward, reinforcement, when you make a good decision and some penalty, some negative, when you don’t. Something that we kind of all understand implicitly. And so, you try to get better and better by seeking positive reinforcement.
Dr. Emma Brunskill is a professor of Computer Science at Stafford University, and her work focuses on reinforcement learning when experience especially is costly or risky. And so you need to learn fast or there could be bad consequences. Such situations are abundant in healthcare, robotics, education. Emma, this seems like a very intuitive way to learn. But what is easy for humans may not be easy for computers. And I’d like to know what are the challenges in reinforcement learning for AI systems and are we making progress?
Emma Brunskill: Well first of all, I’m so excited to be here, thank you very much for the invitation. I think that one of the amazing things we’ve seen in AI as often once we start to make progress, we realize some things are easier than maybe we would’ve expected. Recently, reinforcement learning has been used to do things like have computer agents be able to play video games. For some of you who may be familiar with the old Atari games we now have reinforcement learning systems, these type of Ai algorithms, that can play these games as well as humans. In some ways even though it might have taken a teenager a few hours to learn how to play these games, we have very good algorithms for doing that now. I think one of the challenges we have, is that in those cases these systems work by trial an error, and learning to play these games over millions and millions of games. And so that’s possible for video games because that just requires compute. You just need lots and lots of computers, and then you can have all of them be playing these games and eventually you’ll learn how to win. You can sort of learn to optimize that score.
Russ Altman: Probably I would guess, much more experience even than the human experts needed to get to their level of performance.
Emma Brunskill: Orders of magnitude more.
Russ Altman: Okay.
Emma Brunskill: So there’s some lovely of work that compares how fast people learn compared to computers. And people are learning, peoples rate of learning on these games is way faster. And the amount of experience they need is far, far less. Of course we have huge amounts of other experience we can bring to bare. Whereas these computers are learning from scratch. But it’s still a case where the amount of experience they need is far more than any human.
When we think about using these algorithms for other types of cases, some of the cases I really care about, is things like personalize adaptive education. We don’t have, an infinite number of people to train these systems on. And it matters, because if my system takes 100 million people to learn how to best teach people fractions, that’s a big deal. Where’s if I could just learn on a few 100 students that would be much better.
Russ Altman: Yes. So how, just to help us understand, my 15 month old grandson understands when a success and a failure, because of good taste, bad taste, pain, and not pain. How do we encode rewards and penalties in computational programs, so that they can get a sense of when they’re doing well and when they’re not doing well?
Emma Brunskill: I think that’s one of the core challenges right now. In things like board games, like the game “Go” or in Atari games, like video games there’s a score and the agent is receiving that score. So they know when Pac-Man eats a cookie, that then it gets an increase in score. And so it’s told that’s what the reward function is. I think in real systems one of the critical challenge is, is what should that reward function be? So I work a lot on educational systems, and sometimes that reward system might be test performance. But really that’s a proxy for what we care about.
We care about things like high school graduation rates or people being employed. But those are just really hard to measure, and they take a long time to observe. So we often use what we call, “Proxy rewards,” to things that are more easily measurable that we hope are correlated with the long-term outcomes we care about.
Russ Altman: Now I do wanna get to your educational work because it sounds very compelling and you’ve mentioned it, but before that just to set up some background. The other thing I know you care about, is the theory of learning. And actually proving things about what’s possible and what’s not possible. I’ve looked at some of the papers, and they’re very technically, I would say deep. And so we can’t go into the details in this discussion. But I’m wondering if you can give us a flavor of what can we learn by thinking about the theory of learning and what is possible and what is not possible? And how that informs your real world experimentation in things like education or health care?
Emma Brunskill: So a lot of the theoretical work I do is really inspired by the challenges that come up when I think about these educational systems. Systems that we want to learn fast. And so me and my group have been very interested in, what does it mean, to be hard to learn to make good decisions. Why might some problems be harder why we might need a lot more data, to try to figure out what is the right decision there. And one of the things that I think is most lovely is that we found that optimism is provably optimal in some cases.
Optimism in this case is, let’s imagine that you go to a restaurant, and there’s a couple different dishes. You try one and it’s not so good. Often many of us might never try that one again, we’d always stick to the one. What optimism suggests is that, let’s say the chef just had a bad day, and that first dish is actually amazing. Optimism suggests that you should try things multiple times, because over time either the thing is really better like if your optimistic and then it will really be good, or you’ll learn something. It turns out pessimism doesn’t have the same properties. You might never have tried that first dish again and you might never realize it’s good.
Russ Altman: That is very interesting. So your saying even the kind of, I’m using scare quotes again but, even the so called attitude of the algorithm for well I think I know what’s gonna happen here, but maybe I should dip in one more time to kind of make sure or a few more times.
Emma Brunskill: Yes.
Russ Altman: Versus, ah I don’t like this I’m gonna move on to another area. Even that kind of strategy can effect the rate at which true things and useful things are learned.
Emma Brunskill: Exactly and not just rate, but it might be that it never even learned the right thing eventually. So maybe that first dish was really, that chocolate chip cake was really great. The chef just had an off day. You wanna be optimism with optimistic with your uncertainty about how good things are. And if things merrily are better, keep trying them. And we can show in some cases that’s actually, provably the fastest way to learn things.
Russ Altman: This is the, Future of Everything. I’m Russ Altman. I’m speaking with Emma Brunskill. And we’re talking about proving things about learning. And I think the important thing that is implied by your comments just now is that, you can actually turn these learning tasks that seem to involve words that are fuzzy into precise concepts. And then you can actually do mathematical level proofs, which I would guess that’s extremely interesting cause it won’t get, if you can prove something is for example impossible, you won’t try very hard to do something that’s impossible, or you’ll say, “I know that this is impossible so I’m only gonna be able to get an approximate result.” So I would guess it puts good, kind of cones and boundaries on what you’re even willing to go for and try.
Emma Brunskill: That’s right. And I also think that right now, if we want people that aren’t my PhD students other wonderful PhD students, to be able to use these algorithms. We need to do what many of us are talking about in terms of democratization of AI. Which means we want these systems to be robust and usable by people in lots of domains. And that requires them to have good strong properties. We can see this in software, people often verify software so we know that on any plane the plane won’t crash if you use autopilot. Similarly, if we want things like reinforcement learning to be robust enough for the real world I think we’ll need these type of guarantees.
Russ Altman: Yes because then, I think to repeat what your saying, when we transfer an algorithm from one domain to another, from playing “Go” to helping a doctor do surgery. We’d like to know, that there are certain guarantees about it’s performance that we don’t have to reestablish or worry about if they’re present or not present.
Emma Brunskill: Exactly.
Russ Altman: So let’s go to the education work. What motivates — so I’m very interested in your use of and your focus on education. Because as a child of the ’60s and ’70s I was exposed to absolutely terrible computer systems, that were trying to help me learn, and I’m positive that any success I’ve had in life is because I ignored those systems and didn’t use them. But I was very lucky to be in a very good schools and have very, other options for learning. Tell me about the societal need for educational help from software and from AI systems. And then maybe a little bit about how you’re approaching it. And what you see as the big opportunities there?
Emma Brunskill: Absolutely. So I mean, one of the reasons why I’m interested in education is I think it’s one of the biggest tools we have for poverty alleviation. I think it’s one of the things we’ve seen repeatedly can lead to long-term really amazing benefits for people across their life. And I think now, it’s increasingly important actually with AI an automation that we’re gonna have skills, ways to re-skill people and sort of up-skill over time. So we’re gonna need lifelong education.
Russ Altman: For example, with the shift in jobs because of automation is what you see.
Emma Brunskill: Exactly.
Russ Altman: Okay.
Emma Brunskill: Yeah so over time I think we’re going from sort of K through 12. Through K through 75. Yes.
Russ Altman: But I think one of the huge challenges right now, is something that was a shock to me when I first learned it. Is that there are many parts of the world where people don’t have access to good education. And it’s only been relatively recently over the last 10, 20, 30 years, that everyone has sort of primary school education. And I think that because of conflict in other issues, there’s a lot of times where people don’t have access to quality education.
Emma Brunskill: Yes, yes.
Russ Altman: And so those are places where I think that software can be amazing. Software can be infinitely replicated, everyone can use it. If we start to get tools that are effective, we can disseminate those and allow more people to learn.
Emma Brunskill: Now have you targeted, so that’s, makes perfect sense. Given that, have you targeted areas of education that are particularly ripe for opportunity, ripe with opportunity for large-scale. I don’t know if you wanna use the word automation or dissemination of learning. So what are the topics that the world needs to be able to learn where these AI systems might be able to help?
Emma Brunskill: Well, I think that there are a lot of different ones and I think it’s really interesting to see how in educational and sort of data science, educational data science community people are thinking about not just sort of typical hard skills like learning math, but also soft skills like grit, or motivation, or persistence. I can give a concrete example, one thing that we looked at was fractions learning, a lot of people struggle with fractions.
Some people might be familiar with “A&W” it’s a fast food chain. In the 1980’s, they were gonna launch the third pounder. It was supposed to be a competition with a quarter pounder. And so they did taste tests, the beefs tasted great, they thought this was gonna be awesome. It was the same price as a quarter pounder. And they launched it, and it flopped. And it was because everybody thought that a third pound was less than a quarter pound. And I think that illustrates that a lot of us don’t —
Russ Altman: Wow.
Emma Brunskill: Fractions are hard. Fractions are tricky. And it has real implications if we don’t understand these things. And so a lot of my own work has thought about, fractions learning.
Russ Altman: Okay.
Emma Brunskill: And so in one of our collaborations was Zoran Popovic, at University of Washington. We thought about how could we create like an educational game, and optimize it to help people persists for longer. Get people to learn more. And that was one of the first places where we, used reinforcement learning to amplify that.
Russ Altman: So that sounds great. This is the, Future of Everything. I’m Russell Altman. I’m speaking with Dr. Emma Brunskill about learning about fractions. What was the key insight about getting people to stick to their fraction lesson, it sounds like if I’m getting at what you said, they stuck to their fraction lesson a little bit longer, and they learned a little bit more. Was it a pat on the back? And how do computers give pats on the back?
Emma Brunskill: Well one of the things there is it introduce this other core question of how do we take information about past decisions that were made and their outcomes, and figure out what we should do in the future. We have often call this sort of counter factual, or what if reasoning. What if instead of listening to this podcast, you went and got coffee. How much better would have your life been? Well you can’t know that right. You can’t have seen that alternative future. But there is statistical ways to try to estimate that. So we use some of those in this case to take old data, from around 10,000 students, to figure out better pathways. Better adaptive pathways for students. And the great thing was that we could use these type of machine learning statistical methods to estimate that we can improve persistence by 30% by being more adaptive. And then we ran a study with another 2,000 new students, and found that indeed, we improved persistence by 30%.
The reason that’s significant is for two things, one is that machine learning can really help, that there are cases where we can greatly optimize, compared to past sort of expert like performance. And then the other is that we could predict this before we deployed it. We’re sort of predicting the future. We’re saying, “Before you deploy this system, I can tell you how much better it’s gonna be.”
Russ Altman: So that’s interesting. So in this case you had enough experience with previous students that you didn’t just passively kind of look at what their track through the software or through the problem was, but you actually inferred that, you could prune their path in some way. Maybe they made some false starts and it sounds like you were able to recognize those false starts and say, “If we avoid that path we might be able to get them to their goal a little bit faster.” I mean is that a fair way?
Emma Brunskill: Yeah, and let me just say, so what we were doing here is deciding what activity to give to the student after they complete each one. It’s a series of sort of these video game activities, and the question was, “What order do you give them to students, in a way that’s adaptive?” Depending on how they’ve done, to maximize persistence. And what we found there is that by kind of stitching together different peoples, maybe you did activity one-two and I did activity one-three we could figure out which of those is better for future students.
Russ Altman: Yes. And then, how did the students respond to these systems? They must be aware of making progress and so there must be a certain, again of internal reward system. Is there any kind of, is the system also rewarding them in some way, other than the acquisition of the skill?
Emma Brunskill: I think it’s a really interesting question. One thing we found, not in this system but in other systems that exposing information to students about their own learning as often itself is really productive. So sometimes there things like skill bars or other visualizations which allow people to know that they’re progressing.
Russ Altman: Yes.
Emma Brunskill: And one thing as you alluded to earlier, many of us find it very motivating to observe progress and to feel like we’re learning ourselves. And exposing that back to the learner itself can be very powerful.
Russ Altman: So when will these systems be available? Is there a path from the research lab to deployment? And I’m sure you think about that because your work is motivated by real world problems. What does that path look like?
Emma Brunskill: I think it varies. I think one thing is that there’s still a lot of sort of foundational questions to get right. One of the things we’re starting to do in my own lab, is reach out to new potential partners to think about how these types of ideas can be used in really large systems. Things like MOOC’s and other things Massive Open Online Classes.
Russ Altman: Thank you.
Emma Brunskill: Yeah.
Russ Altman: Thank you for defining your abbreviation.
Emma Brunskill: We’re, right now most of those use still often pretty traditional ways of teaching. Giving a lecture and then having people do activities. There’s not normally a lot of adaptivity or personalization and so I think these types of techniques could be used in conjunction.
Russ Altman: This is, The Future of Everything. I’m Russ Altman, more with Dr. Emma Brunskill about reinforcement learning, education, and other AI approaches towards acquisition of knowledge and skills. Next on Sirius XM Insight 121.
Welcome back to, The Future of Everything. I’m Russ Altman. I’m speaking with Dr. Emma Brunskill about learning, machine learning, and reinforcement learning; especially, in the context of education in our last segment. And now I wanna move to healthcare. So that’s another area I know you’re interested in, and of course is near and dear to my heart. What are the opportunities, and challenges, and accomplishments for these kinds of learning methods in healthcare?
Emma Brunskill: Yeah, so one of the things I was saying before is that, one thing we do with this educational system is to try to leverage old data that was collected and the outcomes and try to infer what we should do in this future. This sort of what if reasoning. I think that healthcare records is an enormous opportunity for this. The way that we’ve seen artificial intelligence and machine learning be applied to healthcare so far, is largely in terms of predictive measures. Predicting diagnosis or things like that. But to me one of the real opportunities is just to say, “We’re constantly making treatment decisions, recommendations.” Can we identify if they’re some places where sort of having AI as a co-pilot, we could make even better decisions.
And so we’re trying to use similar types of statistical methods to figure out how we use sort of sequence of decisions that are being made by doctors or healthcare providers, and infer if we can find things that might help them make even better decisions. You could imagine particularly in some cases, that they’re might be sort of very subtle trends that machine learning systems tend to be very good at uncovering. And that might be very beneficial.
Russ Altman: Just to go back to our discussion of fractions in that case you had a lot of examples of fraction learners and their path through fraction learning. And then you had your new learners and you said even, you had a very good idea of how they might be able to learn by stitching together. So can I take that lock, stock, and barrel and transfer it now to physicians where you have lots of patient trajectories through the healthcare system, and where we might be able to help to get to good outcomes faster, by stitching together the experiences of other patients to create a new experience. Is that…
Emma Brunskill: That’s exactly the right idea. So, for example one thing we were looking at recently is heparin dosing for people in terms of blood clotting, and the question is can we identify trends in there which would make us able to sort of, either learn the same type of decision policies as what clinicians do or potentially even better. I think one thing that’s a big, open and technical challenge in doing this is that, when we try to use these algorithms we often want to have access to all of the features that people might be using to make these decisions. So, in the case of the fractions game, the decisions were being made by an algorithm.
Russ Altman: Right.
Emma Brunskill: So we know all the features. If the decisions are being made by people, whether the clinicians or human teachers, there might be all sorts of subtle things that aren’t in the data, that are really important for those decisions.
Russ Altman: They might not have recorded, major factors in their decision. For example, I mean as a physician I know that I sometimes prescribe different drugs to different people based on if it’s a once a day or a twice a day. It’s very difficult to take a drug four times a day and so if I have a patient with a very difficult life with two jobs and kids then a four time a day drug, even it’s better might not be piratically better because they won’t be able to take it. So, we’ll go for the once a day drug. And those kinds of, I rarely would document that.
Emma Brunskill: Right, that sort of feature is if someone sort of express verbally that they’re super, super busy, that information might not be put in the electronic medical record system. But that sort of information is really important if we want algorithms to be able to develop new decision policies that we can reliably predict how well we’ll do in the future.
Russ Altman: Do you find that the medical collaborators that you work with, are they open to these ideas of the systems? You could imagine that they would say, “Oh the system is gonna be second guessing me, and I’m busy, and I don’t need to have a nagging system reminding me of all the things I could’ve done.” So how do we think about the human aspect of introducing these systems into very elaborate delivery systems like in healthcare?
Emma Brunskill: I think that’s a great issue and that’s one of the reasons why I love collaborating with other people, including with my HCI colleagues like Jay Blundae which I think —
Russ Altman: HCI is Human Computer Interactions.
Emma Brunskill: Thank you, yes. So, I think that those sort of experts think really deeply about how do we make systems that are really useful for people to use.
Russ Altman: Right.
Emma Brunskill: And they’re all sorts of important questions that come up, for example, “What is the set of features that we should be writing down to explain these? What are the types of practical constraints? And how do we make these so that ultimately when they’re used with people, you get better outcomes?” Because these systems aren’t just used in isolation.
Russ Altman: Yes, unfortunately people are people and I mean that in a good way that it somewhat simplifies because many people would respond in similar ways, even in different situations to new technologies and we can get best practices for that.
But, another thing that people do is they worry and I wanna get to the in our final part of our discussion, about issues of fairness, accountability, and safety. So these systems now have very intimate personal data about me. They know if I’m a fast-learner, or a slow-learner, or even if that’s a thing. They know if I had trouble with fractions, if it’s in the healthcare they know how much, I had a bad disease or whether I was a compliant patient and taking, that’s the word we use it’s a terrible word for taking my medications as directed. I’m sure you worry about these issues. And how do you approach them?
Emma Brunskill: Yeah, I think they’re really important and critical issues. They’re also really important technical issues and I and many others are thinking a lot about these aspects. There’s in fact now new conferences called Fairness, Accountability, and Transparency in Machine Learning so I think that the whole community is really, taking this issue very seriously.
I think, one of the things I think about is how do we make these systems such that, they can kind of be constraints. So, a lot of these systems are trying to do some form of optimization. They’re trying to optimize students scores or they’re trying to help people they’re treatment improve. But we often want some sort of constraints on them, something to say, “We need these to fair, we need for different sub-groups men versus women et cetera that we have algorithms that are gonna do just as well for different sub-groups.” And I think one of the exciting things to me is, we can often form this mathematically, and so we can create algorithms now that not only can be fair but in sometimes can actually even reduce biases that are present in the data.
Russ Altman: This is The Future of Everything. I’m Russ Altman I’m speaking with Emma Brunskill about this very interesting topic of fairness, and even removing unfairness. So, yes so let’s say that you had made in both of your examples both for education and healthcare, you had looked at historical patterns to try to predict things. But what happens if for example, in those fraction learners if it was 80% little boys and 20% little girls, then maybe when you exposed the system to little girls they were not getting the stitch together trajectories that they should be. So tell us, I’m sure it’s very technical but can you give us a sense for what a fairness algorithm might do to improve fairness when the data that it’s based on, was not very fair?
Emma Brunskill: Yeah, one of the things that we have done and this often been worked together with my former post-doc, Phil Thomas, is to think about how do we put constraints into the system so that you have sort of this, slightly more complicated optimization problem. And where we’re looking at both, we want to say being able to provide these affective systems but we wanna do so in a way that we make sure that’s safe for both men and women, I or different sub-groups. That the systems don’t unfairly penalize one group at the success of the other. And I think that’s the thing that we often think about is we’re like well, we don’t wanna have a much more accurate system for males than for females. It shouldn’t get a better solution for men at the penalty of that. And we can put that in mathematically as constraints. It makes it a little bit more complex to solve and more computationally intensive but it’s possible.
Russ Altman: So you’ve use this word constraint, can you and it sounds like it has a very technical meaning. Can you give us some examples of what constraints might be that would tend towards more fairness?
Emma Brunskill: Yes, so for example imagine that we are thinking about more like a predictive task, like predicting test score performance or something. I would like to make sure that the accuracy for which I can predict bends. Let’s say, I can predict them plus or minus one point. I don’t wanna get an algorithm that could do that that then means I can only predict for women, plus or minus 10 points. Or that I systematically under-predict for women. And so we could put that in and say, “You can’t over-predict for men more than say .05 compared to women.” Or we can just say, “You can’t over-predict at all.”
Russ Altman: Ah, so would you perhaps accept a less precise performance for one group in order to make it more fair across all groups? Is that the kind of thing that might happen?
Emma Brunskill: I think that’s exactly one of the technical questions that we try to answer is, “Does that mean that you slightly sacrifice performance on one to make sure that it’s equally good across everybody or there are cases where it sort of, it wins for everybody you just end up with a better solution.”
Russ Altman: So that second case of course is an easy one?
Emma Brunskill: Yes.
Russ Altman: But the harder one and one which I’m sure would require discussions and some social agreement, is do we sacrifice performance for one group in order to make it more fair. And there’s, I’m sure people who would say, “Yes, absolutely!” And I’m sure there are people who would have trouble with that, and then it becomes a non-technical but very important decision that society has to make.
Emma Brunskill: Yeah, and I think ultimately it shouldn’t be computer scientists that are making these calls it should be society. And in many cases they’re regulations that specify that you can’t systematically discriminate against one group versus another. And so this is a way for the algorithms to respect those decisions that have been made by our society.
Russ Altman: Thank you for listening to, The Future of Everything. I’m Russ Altman. If you missed any of this episode, listen any time on demand with the Sirius XM app.