Episode Summary: This week’s in-person interview is with Adam Coates, who spent 12 years at Stanford studying artificial intelligence before accepting his current position as Director of Baidu’s AI lab in Silicon Valley. We speak about his ideas around consumer artificial intelligence applications and what he’s excited about in terms of global impact, as well as what he thinks may be more ‘hype’ than reality in the AI space. Coates gives a hint at the applications that Baidu is working on, technologies that have the potential to influence billions of mobile and computer users worldwide. If you’re interested in the developments of speech recognition and natural language processing, this is an episode you won’t want to miss.

Expertise: Deep learning and feature learning; reinforcement learning

Recognition in BriefAdam Coates received his BS, MS, and PhD in Computer Science from Stanford University before he was hired as director of the Silicon Valley AI lab (SVAIL) at Baidu, where he’s now been for two years. Adam has authored and co-authored numerous publications in the areas of speech recognition, deep learning, feature learning, and others, some of which have been featured in publications like Wired and CNet. While at Stanford, Adam worked on a number of projects, including the Stanford AI Robot (STAIR) and the Stanford Autonomous Helicopter.

Current Affiliations: Baidu USA

Pulling Back the AI Curtain

When you want to try and discern what’s real from what’s hype, you go to the source. Being right in the middle of Silicon Valley, there are plenty of sources in the world of AI research and development, though it’s not always as simple as walking through the front door. This week, I was lucky enough to walk through Baidu’s doors and sit down with Adam Coates, the director of the Baidu’s AI Lab in Silicon Valley. I first wanted to know what he thought was more hype than real news in the AI media.

“Based on a lot of the genuine progress that’s happening in AI right now, substantially because of big progress in deep learning and neural networks, many people are starting to feel that full artificial general intelligence (AGI) may be just around the corner…I think working with these technologies every day, it’s pretty clear that that’s just not where the progress is happening right now,” said Coates.

Adam illustrated his assessment for me in an analogy that helped clarify the often fear-hyped media tale from the actual, in-the-trenches story. If you look at automobiles or airplanes and the progress made up until today, said Coates, it’s obvious that we have much more efficient and safer cars; however, no one is looking at the latest Tesla and worrying about it turning into a transformer and having a battle on the freeway amidst tomorrow morning’s commute.

We don’t have these thoughts because it’s self evident; it’s as simple as looking at the technology that’s available and knowing there’s no plausible way for this scenario to happen in any near future. “I think deep learning is in this same space. It’s getting a lot better, we’re seeing things we didn’t think were possible a few years ago, but if you’re actually working with that technology, its very self-evident that we just  don’t have the pieces to make full artificial intelligence at this point,” explained Coates.

Though we may not be able to talk philosophy or reason with our devices anytime soon, Adam is optimistic about what’s happening in the AI field today, particularly with applications in speech recognition and natural language. While Adam doesn’t pretend to have the answers for how to solve the issue of understanding consciousness, he does think there’s one area that humans could reasonably solve over the next decade. The primary driver behind this potential solution is powered by the huge amount of labeled data that is now available to us.

“The speech recognition that we’ve been building in our AI lab works incredibly well because we can give it audio along with transcriptions and the neural network can learn from all those transcriptions to recognize speech; this is a kind of machine learning called supervised learning, but it’s very clear that this is not how humans learn to recognize speech,” said Coates.

True, we don’t play 10,000 hours of transcribed audio for our children, and learning from mistakes is often done without adult supervision, which is not the case with machines, (they require direct, correctly-labeled feedback before they can learn).

“One of the things we know humans are somehow accomplishing is unsupervised learning, we know that we’re taking in audio and visual data and learning to make sense of it in a way that help us very rapidly adapt to new tasks,” explained Adam.

Adam believes there’s a lot of cutting-edge research happening now in unsupervised learning, but it’s anyone’s guess as to when machines will really be able to take the wheel. “My sense is that we’re making real progress on the problem, but no one has really cracked it, we don’t know when the big watershed event is going to be,” Adam noted.

Speech Recognition, a Giant Leap for Machines

Looking ahead into the future can be both a wise and potentially dangerous strategy, and predicting outcomes is risky, what Adam calls “throwing darts.” But based on Baidu’s search engine capabilities and their recent leaps in speech recognition with their Deep Speech engine, I wanted to know what the average consumer’s life might look like in 5 to 10 years.

Adam stated that in general, technology users are already comfortable with texting and text queries, but there are lots of cases where this doesn’t make sense – like when you’re driving or want to do a long transcription. “If we can have speech recognition that’s as good as a person, we’re going to start being able to interact with our devices in a new way,” said Coates.

This technology opens pathways for more complex and connected domains, such as  speech-enabled homes and cars. Coates noted that he is most excited for this next step, when we are interacting and connected with our devices and our appliances using natural speech. He stated  that in largely “mobile societies”, which at present include China, Japan, and to a lesser extent the U.S. (largely the millennial generation), those who aren’t used to having a laptop or PC will have a totally different way of accessing the Internet and connecting with world.

Another area that Baidu is focused on developing is virtual assistants, especially for tasks that consume a lot of visual attention (like driving a car) and therefore require speech-enabled commands. “If we can really make speech very low overhead, in the sense that you don’t even have to think about whether you would use it or not, we could take that away,” Adam explained.

“This is about really changing our relationship with the devices we use and the way we get things done, and making them much faster and taking away all the training that we have to do for ourselves.”

If we can think back far enough, figuring out how to get what we wanted out of early Internet search engines took practice and refining the digital interface. Now, most of us perform keyword searches almost automatically. Baidu’s goals include cutting away similar barriers to make speech-enabled technologies just as seamless and useful. Perhaps in the not too-distant future, we’ll swap uploading data and manually manipulating our devices for a spoken introduction, with machines listening to our desires and needs and responding intelligently.

This type of keyboard-replacement technology will come in handy for many tasks, especially in cultures that don’t make as much use of keyboards, simply because it’s more arduous to type certain languages, like Mandarin for example.

Coates explained that we can already see a distancing from the keyboard in “mobile-first” societies like China because of interface challenges. “A lot of habits of mobile users are much more sophisticated than in the U.S., because that’s their main access to Internet; you see things like QR codes used for Internet more often than here. Sometimes we don’t see the need, because it’s faster to type, but web-links are often in roman characters and not in their native language, so it’s much easier for them to shoot a QR code,” says Coates.

It’s an interesting and worthwhile thought—that technologies that didn’t succeed in one place may very well catch on in another country for reasons like language. Another trend that’s taken greater hold in China than in the U.S. is offline services. Coates describes an example that’s affected him personally while abroad—trying to access a soda machine.  “In China you see quite a few (soda machines) with a QR code or payment method on the phone; I would never think to do this—when I look, I sort of pause, then see someone walk up and do it and get a soda and walk away, and you think ‘Wow, this would have been hard for me to build into my habits’, but because they don’t have some of these habits there’s quicker adoption,” says Coates. 


[This interview has been revised and updated as of February 10, 2017.]