From scheduling appointments to setting the thermostat to ordering pizza, virtual assistants are growing more commonplace by the day.
Stanford professor Monica Lam says they will only become more entrenched as their capabilities grow and their voice-recognition skills become more accurate.
Such developments are welcomed by many who rely upon Alexa and Siri and other virtual assistants. But it is also troubling to those, like Lam, who worry that privacy concerns and lack of competition put too much power in the hands of a few companies. Lam is an advocate for a more open approach.
“If there is no open competition, then you are kind of stuck with whatever these platforms provide for you,” she tells host Russ Altman in the latest episode of The Future of Everything radio show from Sirius XM.
Lam thinks a lot about the future of privacy. She says we can have both the AI and privacy at the same time, but first she’d like more options in the marketplace and for those who dominate the market to be less insular than they are today. What’s needed is an “infrastructure of privacy” that returns control of data to the rightful owners: the users who created it in the first place. The key to that, she says, is choice.
Tune in to this episode of The Future of Everything to hear more about how Lam's open-source effort to develop and share virtual assistant technology is keeping user privacy at the forefront. You can listen to the Future of Everything on iTunes, Google Podcasts, SoundCloud, Spotify, Stitcher or via Stanford Engineering Magazine.
Russ Altman: Today on The Future of Everything, the future of virtual assistants. Siri, please call my mother. Alexa, what’s the weather today? In a very rapid period of time, virtual assistants, these little disembodied voices in our devices that answer our questions and obey our commands, to some degree, have emerged as a pretty important and prevalent part of everyday life.
Our cell phones have an assistant. If it’s an Apple phone, it’s Siri that can answer questions, access the features of our phone, be generally available. Amazon’s Alexa technology can make purchases, can give information, play music and, I believe, do literally 10s of thousands of other things. Microsoft has an assistant, I believe, named Cortana. Google and Android phones have an assistant called Google Assistant, I believe, and these are remarkable. They understand spoken English, and I’m sure they understand other languages as well. They seem ready to respond all the time. They give easy access to what otherwise what might be complex computational instructions, and even physical capabilities, like turning appliances on and off. And they’re getting better as we give them more experience.
But there are some downsides and some worries. There are fears of privacy. They’re always listening, right? In fact, there was a thing in the news recently about, in fact, some of them are always listening. They have access to your credit cards and bank information sometimes. They are learning your preferences for things, whether you like it or not and whether you won’t like your preferences or not.
Are these systems open that can be inspected, or are they closed? It seems that they may be with us forever, but it is not clear that they are built or maintained in a way that is comfortable for all users, especially the users who take the time to look under the hood.
Professor Monica Lam is a professor of computer science and electrical engineering at Stanford University. Monica is an expert at many things and, most recently, virtual assistants. How they work now, and how they should work in the future.
Monica, you have suggested that, currently, it is maybe unhealthy to have just a few big companies creating these proprietary virtual assistants, and that there maybe should be a more open environment for competition and integration. Now, I interact with at least two of these regularly, and I’ve had the experience of calling Siri Alexa or Alexa Siri. Will the future ever give us the chance to have a single assistant that has access to all of our different devices and information, or will it always be fragmented, the way it is now?
Monica Lam: Actually, in the end, I would imagine that there will be, like, one or two assistants that you will interact with. I mean, you interact with one, but the world would be using just a couple of these assistants because we have seen so many platform monopolies that emerge.
Russ Altman: Yes.
Monica Lam: I mean, if you think about Facebook, it is basically the de facto standard for social interactions, and I think that there may be one or two, but what I worry about is that there will not be choice, and that means that there’s no open competition and then you are kind of stuck with whatever these platforms provide for you.
Russ Altman: As an expert at these platforms, can we, let’s step back. What are the key elements that these platforms, technologically, can do? Like, for example, one obvious one is they clearly seem to be understand spoken language. How do you think about the key features of a virtual assistant that are required to be good and functional, and then can you tell me how much of that is shared across the industry, and how much is proprietary?
Monica Lam: Let’s start with the basic operations. You can turn on the light, you can, you know, open the garage door and so forth, and —
Russ Altman: Get the weather.
Monica Lam: Get the weather, get a joke.
Russ Altman: Right, a joke.
Monica Lam: And all of these primitives are stored in each company’s proprietary platform. If you look at Alexa, they have, like, 50,000 skills, last time I looked.
Russ Altman: 50,000 for Alexa?
Monica Lam: 50,000 third party skills, like you can order a pizza, you can call Uber, you know, all these various things that people wanna do.
Russ Altman: And when you say skill, that’s what they call, like, one set of commands that are related to a certain narrow idea or device, okay.
Monica Lam: Like, the thermostat is a skill and so forth, so there’s 50,000 skills. A lot of people enter this information into Alexa’s repository of skills, but that’s all proprietary, meaning that if I want to build a virtual assistant, I cannot tap into one of these skills and let my assistant order Uber. I have to redo all this work, and as a result, Google is putting in lots of skill repositories.
Russ Altman: As a result, say it again because of the mic.
Monica Lam: Sorry.
Russ Altman: As a result?
Monica Lam: Imagine that you have to build your own platform of skills. How much work does it take, alright?
Russ Altman: So right now, what I’m inferring is that, the Apple Siri can’t use things that were built for Alexa. You would have to start from scratch.
Monica Lam: You will have to start from scratch. If you look at CES, which is the Consumer Electronic Show —
Russ Altman: I believe it’s a big show in Nevada.
Monica Lam: It is the biggest show. In Las Vegas.
Russ Altman: In Las Vegas. Yes, I’ve never been.
Monica Lam: You should go!
Russ Altman: Is it fun?
Monica Lam: It is very, you have to go and experience it for yourself. It is huge, and everybody’s putting up these IOTs, Internet of Things, and so what I understood is that there are like 40% hooked up to Alexa, 40% hooked up to Google and the rest probably not hooked up to much of anything, okay? The monopoly, maybe the oligopoly, is emerging if there are multiple. There are couple of these really big platforms, and these are closed systems. Sorry, they’re open, but they’re proprietary.
Russ Altman: Okay.
Monica Lam: Okay, you can enter your skills into those repositories, but it’s not open to everybody.
Russ Altman: Now, why as a consumer or even as a government person, should I be worried about this oligopoly? Is it fundamentally a problem?
Monica Lam: Well, let’s look at the web today. You know, what we have is a graphical web, right? You go to the browser and you can go visit any webpage that you want, because all the different companies want to reach as many people as they want. You know, they want to reach all these people. So they put up the website that you can use any browser to go visit those.
Russ Altman: Right, Safari, Firefox, Chrome.
Monica Lam: Exactly.
Russ Altman: Mosaic.
Monica Lam: Well, that’s an old browser. But these webpages are open to all browsers.
Russ Altman: Yes.
Monica Lam: Just think about it, like, the virtual assistants are like the new browsers. This is how you use language to access all the different services and digital devices. And I call this the linguistic web. But now, the linguistic web, it’s open, you can enter information in it, but it’s proprietary. If they want to make it, they can even choose to say, you cannot put certain services on my virtual assistant, because I own it. We do not want a closed. We do not want to have a proprietary linguistic web.
Russ Altman: This is The Future of Everything. I’m Russ Altman, I’m speaking with Monica Lam about virtual assistants and really, the ecosystem, and that’s a great analogy, because you’re right. I can go onto the web and browse it with a number of browsers and so you’re implicitly saying, why don’t we have virtual assistants that I can choose my assistant, and then it accesses these services on the web independent of the source. This is not the case now, and I know that you’re working on creating this open environment, so tell me the elements of openness that you think are the kind of critical pieces that need to be put in place?
Monica Lam: Our project kind of has two parts. The first is to focus on what we should do about keeping the web open, keeping, what we would like is that once I have a choice of virtual assistants, I want the virtual assistants to be able to talk to one another, just like email do. And the third thing here is that, if you, you know, everybody is worried about big data, because you need the big data to advance science and so forth, but right now, there seems to be, like, not much control that those people have over their data.
Russ Altman: To say the least, yes, and people are very worried.
Monica Lam: Oh yeah. What we really want to do is to create a new, we call it an infrastructure, basically a system, that allows users to say, you can do this with my data, that with my data, all said in natural language, and now we can have both the A.I. and privacy at the same time.
Russ Altman: This is exciting, because now you’re saying, you had this analogy to the web browsing, but actually, as we all know, web browsing didn’t solve the problem of privacy very well, and we’re getting news articles all the time about data that we thought was either confidential or held only for certain, very narrow purposes is being used and monetized by companies in ways that we didn’t anticipate.
You’re proposing that this next generation, I think you called it linguistic web, is gonna actually try to be built from the start with a better understanding of privacy preferences and options for the user, so that’s exciting, because it means that we’re not just gonna move from the graph web to the linguistic web, but it’s gonna be a better, hopefully a better, more controlled experience.
Monica Lam: Yeah.
Russ Altman: Will the companies go for this?
Monica Lam: That’s a very good question. If you look at the industry, I think, the interesting news is that, if you look at the past, okay? We have the Windows monopoly and then there’s a lot of work on the open Linux operating system. And of course, because of this open source software, Apple was able to put out Mac OS, iOS, and Google is doing Android, right?
Russ Altman: And those are built on the more open?
Monica Lam: They’re built on open source software that a lot of people have contributed, so there has been success of these open source systems, and what I see here is that the virtual assistant is kind of like a gateway between consumers and businesses. All the businesses are interested in this. If, you know, the fact that there are, you know, if you just have a couple of these platform companies, then a lot of other people are gonna say, what about, you know, what can we do? I think that there’s a chance that a lot of companies would want to support this open alternative, okay, because, you know, a lot of people lose when there is this control over the access of the linguistic web.
Russ Altman: Is there a chance? Yes, I totally understand how the upstarts, the ones who don’t have a big stake in the game would say, let’s have it be open, that allows me to compete. How about the existing, reigning winners, the people who are running Alexa, the people who are running Siri? Are they gonna resist this tooth and nail, or do you think you can make arguments to them about why it would even benefit them to have this more open approach to building these virtual assistants, or is that just pie in the sky, Russ being not very realistic?
Monica Lam: Well, there are two parts to those answers, alright? If you think about, just say the Windows system and Microsoft, and you have a big giant called Apple, who says that, I can have my own operating system and I adopt the open source. It doesn’t have to be upstarts that would find this useful, because right now, as I mentioned, a lot of the IOTs are hooking up with just Alexa and Google. What about Microsoft? What about Apple, right? They would benefit, and those are huge companies that can obviously make a difference. And Apple, Microsoft and a lot of other companies are interested in privacy, so this is a way for them to say, if this is what you’re interested, work on the open web, because that is something that they care about.
Russ Altman: This is The Future of Everything. I’m Russ Altman, I’m speaking with Monica Lam, and now we’re talking specifically about this issue of who would adopt these open frameworks, and you just made the argument that actually, even the big players might see incentives. Is there, so when you say some, is it for random reasons or are there systematic reasons?
Monica Lam: Okay, so I was talking about the fact that, if you are not the leaders. Like, you know, in the operating system, Apple wasn’t the leader and they adopt the open source, so that’s one pot. The second pot here is that turning everything into a linguistic web is a lot of effort. There is a matter of — you know, in the past, it was just putting information up on the website. It’s a lot of effort that is amortized across a lot of people. And if you have an open system, more people will be willing to put that information out. They don’t really want to say, you know, I am just helping a proprietary platform to succeed.
Russ Altman: Right, because when I do my website, I know that everybody will be able to read it. And I want to build a virtual assistant skill, I would probably like to build it once well and have it work everywhere.
Monica Lam: That’s a very good point. We actually did a prototype where, if you enter information into our open source repository, we can stand up a skill for you with Alexa and with Google, and with anybody else that wants to come along. I think that that’s something that a lot of companies would be interested in if they are. You know, they may just be.
Russ Altman: Super useful, I mean, for development. You just gotta get it right once, and then your team will take the job, the hard job, of kind of compiling it down into the necessary platforms for delivery.
Monica Lam: And it is open for other assistants, because they really want to reach as many people as they wish.
Russ Altman: Let’s step back, because I’m intrigued. I have no idea how these assistants are built, like these skills. Let’s say I want to just do something very simple and I wanna have a Siri or Alexa interface to, like, finding out the weather. How do I do it? How do I know what people are gonna say? How do I listen to what people are saying and map it to the right query? Is this, like, all a well-known thing or is this still active research?
Monica Lam: It is very much an active research, because natural language is really hard. I mean, you probably have experience with existing assistants and they sometimes, a lot of times, they don’t know what you’re saying, and this is a very, very hard problem, because natural language is hard and what we are trying to say is very, very broad. As a matter of fact, I think what you’re seeing is just the tip of the iceberg of what a virtual assistant can do for you, okay? And then when I explain that you will see that it is even harder than just the existing skills.
Russ Altman: Good, so paint the future for us.
Monica Lam: Today, the assistant is just following your immediate command. Imagine that you are working with your secretary and then you have to tell them or tell your secretary exactly what to do in every single step?
Russ Altman: That would be a reason not to work with that person anymore.
Monica Lam: That’s not a good assistant.
Russ Altman: Right.
Monica Lam: Imagine I can say to my assistant, is like, every day, order me an Uber at the end of my workday, based on my calendar, okay? Now that is a, you’re now telling the system what you want to do in terms of the, you know, like, based on your habit or, you are now connecting different things together, the calendar and Uber.
Russ Altman: And it can be different each day, so you can’t just make it 6:00 every day.
Monica Lam: Yeah, and then you, so what I see this is that the virtual assistant lets you even use natural language to create programs, right? I’m actually giving you recipes, formulas about what we wanna do, and it can get arbitrarily complicated, and that’s our research topic.
Russ Altman: This is The Future of Everything. I’m Russ Altman. More with Monica Lam about the future of virtual assistants next on SiriusXM Insight 121. Welcome back to The Future of Everything. I’m Russ Altman, I’m speaking with Professor Monica Lam from computer science at Stanford, who is, and we’ve been discussing virtual assistants, and at the end of the last segment, you were describing new skills for virtual assistants, where they had to be a lot more intelligent than they are now. You said, get me an Uber at the end of my workday to take me home. And so that means I need to know what a workday is, I need to know when it ends, I need to know that it’s Uber, but you may also be fine with Lyft, you may not really mean Uber literally, so there’s a lot of interpretation there, and you and your colleagues are trying to build a system that would support that.
What do you see as the big technical challenges that you need to get on top of in order to enable, and when will we see this?
Monica Lam: The problem is natural language understanding. How do you translate natural language into the meaning? You know?
Russ Altman: Yes.
Monica Lam: And our solution is to say, this is the the computer science way is to say, look, here is the technology, this is what the virtual assistant can do. We create a neural network that translates natural language into code directly. This is not what the Alexas, commercial assistants are doing.
Russ Altman: Okay, so say a little bit more about that. You hear a command from a user and you write computer code?
Monica Lam: The neural network would generate computer code. This is the A.I., and it generates the code to be executed by your virtual assistant. And now, it as code is very expressive, you can talk about when something, get something, do something.
Russ Altman: It seems like a lot of this is about abstraction levels, like instead of being very concrete, as humans, we wanna say, you know what I mean. My calendar. And you’re gonna have to train the A.I. to get that.
Monica Lam: Well, you are way ahead. Okay? At the beginning, I will be very happy if I don’t have to ask them to interpret a lot, but just do exactly, Uber is Uber and stuff like that, and that’s already hard enough, okay? You have experience with existing assistants. But if you look at Alexa, they have 10,000 employees working on Alexa, because there’s a lot of need for training data, the natural language. What we try to do is to develop tools that make it very easy for people to do a reasonable assistant without collecting a lot of natural language sentences, because that’s very expensive.
Russ Altman: Right. This is where the winner take all phenomenon might be.
Monica Lam: Precisely.
Russ Altman: Because now, since Alexa is deployed in millions of houses, and as we learned in the news recently, they’re listening to us a lot of that time, they’re just getting tons of examples of people saying things both that are meant for Alexa and that are not meant for Alexa. If I’m a new start-up, I don’t have that data. Is part of your vision creating some open source data for these new efforts, or is it gonna be a different approach?
Monica Lam: First of all, we give people tools so that they can afford to get as much small amount of manual effort, because they really cannot afford 10,000 engineers, okay, to do something for their own domain, they want to have a linguistic interface for their own domain. It’s open source, so we want to say, take this tool, build it and we can now, altogether, build up a virtual assistant that understands what each domain is about and what the natural language needs to be, and I believe that there is a possibility that this open source system can be even better, because every expert wants to contribute to this open source system.
Russ Altman: And we’ve seen this happen in other settings, and presumably, that’s the source of your optimism. I work with drugs and I want to help patients understand their drug side effects.
Monica Lam: Okay.
Russ Altman: And so what I always think about is, how hard would it be for me to train Siri or Alexa or any virtual assistant to recognize drug names, to recognize side effects of drugs, so that they could say, what are the side effects of this drug, or, I’m having headaches, could this be from the drug I’m taking? And so you have a model where I might not need to get 10s of thousands of people talking about drugs, but I may more efficiently get an initial system out there that might work.
Monica Lam: Precisely, and in your case, would you be more comfortable if you are dealing with an open source system rather than, like, putting all this information into a proprietary?
Russ Altman: Absolutely, especially as a physician and an academic, all of my work needs to be in the public domain, and so it would be very comfortable.
Russ Altman: This is The Future of Everything. I’m Russ Altman, I’m speaking with Monica Lam about the future of virtual assistants. Let’s go to a couple of things that I’m sure people are worried about. What is the right way to handle people’s concerns about privacy and about bad guys getting their data?
Monica Lam: I think that the best model of what we can do that still offers people convenience, right? I mean, convenience trumps everything in a sense, and we have to recognize that.
Russ Altman: And what you mean by that is people will do what’s convenient, and if it’s safe, but clumsy, they just won’t use it.
Monica Lam: Yes. It is really the technology’s responsibility to make something convenient to use and still offer security and privacy. And I would point to email as a perfect example of what we should have, right? Obviously, there are a lot of companies that can read a lot of people’s emails, but you have a choice, right? You can even run your email server at home. I think Hillary Clinton did.
Russ Altman: I believe there are some famous politicians who decided to that.
Monica Lam: And now you have choice.
Russ Altman: But they did that for security. When they do that it’s because then they are controlling the servers, the rooms that the servers are in are locked and they say, okay, I have some confidence that no one’s gonna get this.
Monica Lam: Yeah. Imagine your cable company, right? They put a set top box or put the router in your house and they say, why don’t I offer you a virtual assistant box, okay? And your security camera, you know, all the videos go to the box and then your family can look at them, rather than it goes to some central server. Alternatively, you can also have services that says, pay me $5 a month. I promise you I would never read your email.
Russ Altman: This is the thing about the emails, so you have the public email where you’re getting it for free, but you’re giving up some data, like Google Mail, if you’re not paying for it. But then when you do pay for it, the rules change about sharing the data.
Monica Lam: Yes, that’s what we need.
Russ Altman: Do you believe that people would be willing to get a convenient but slightly more expensive solution in exchange for privacy?
Monica Lam: It is up to you. Now we have a choice. We didn’t have a choice with, for example, Facebook, right? I cannot say, don’t look at my data, I’ll pay you $5, right? And the most important thing is that there is choice, and so that’s why we are working so hard on an interoperable, open source virtual assistant, so different people can offer different virtual assistants. All the skills are publicly available. It is beneficial to everybody in that sense. And then they can compete. Competition is what changes the game. You cannot just, I mean, regulations are important and then I think people are doing that, but as technologists, we want to use technical solutions also that says open the space, lots of people competing, some are good for, support your, help preserve your privacy and some people want to offer you really good free service and they read your data. Now you have choice.
Russ Altman: Great, so now you’re really painting a pretty clear picture to me, is that I like this idea of there’ll be a set top box that’s my virtual assistant. It’s the only one I have to deal with because it interacts with my phone, it interacts with my speakers, so it’s the one that I like. By the way, have you given this a name yet? Your perfect virtual assistant, or no name yet? Are we gonna call it Monica?
Monica Lam: Our project today is called Almond.
Russ Altman: Almond, very nice.
Monica Lam: It is not a female name. My male students insisted on doing that.
Russ Altman: As you know, there are huge issues about gender with virtual assistants, and people are now getting very sensitive to, why should they always be women’s voices, and it’s a very interesting topic. But anyway, you have these boxes. If I pay, then I might be able to get services where I can be more certain of my privacy of my data, and if I don’t pay, there’s a clear understanding that, in exchange for being free, my data will be used.
Monica Lam: And then you can do half and half. You can say that, you know, I can share my health data without my personal information and I may even get a buck for it, or $10 for it.
Russ Altman: Great, so then now you can have a menu of options and say my health data is private.
Monica Lam: It’s gonna be natural language.
Russ Altman: It’ll be a natural language, like, Siri, don’t give away my health data, but go ahead and tell them what I’m watching on YouTube or whatever.
Monica Lam: Precisely.
Russ Altman: Okay, well thank you for listening to The Future of Everything. I’m Russ Altman. If you missed any of this episode, listen anytime, on demand, with the SiriusXM app.