[This story has been revised and updated.]
Machine learning has the best chance of achieving meaningful return on investment when companies model previous success.
At last week’s Applied Artificial Intelligence conference in San Francisco, Head of Uber Machine Learning Danny Lange laid out his four principles for simplifying the process of applying machine learning in business.
Lange has witnessed firsthand the evolution of machine learning technologies, and he has a pretty good idea of what works and what doesn’t when companies want to implement machine learning for the first time.
A software creator and computer scientist by early trade, Lange founded Cupertino-based Vocomo Software in 2001 before it was acquired by Voxeo in 2005. Most recently in November 2015, Lange took on his current role as head of machine learning at Uber.
One of the beauties of more companies implementing machine learning are all the mistakes they make and the resulting lessons that can be gleaned by those who are interested in using the technology, but haven’t yet made the leap.
You don’t have to be a behemoth company, Lange says, to apply machine learning. Open-source machine learning platforms are more accessible than ever, and if you have the right framework for implementing them, opportunities abound for even smaller businesses to find value. Lange suggests maximizing productivity when implementing machine learning by thinking through the following points during planning phases:
1 – ‘Low hanging fruit’ is the answer to this question: “If we only knew…”. Find a problem before you implement machine learning as a solution. Ask the question you’re dying to know but can’t figure out with existing methods: “If we only knew BLANK—i.e. the real return on investment (ROI) of our video marketing, or how to get more people to stay on our subscription software, or the commonalities of our customers that require the least amount of weekly and monthly maintenance, etc.” If your problem/inquiry involves finding patterns in data, structured or unstructured, then there’s really no end to the questions you can ask; of course, having the right data and ensuring that it’s “clean” is also an important consideration and a task that’s keeping a growing sector of data scientists busy. If you’re still uncertain of where machine learning could benefit your company, read our article on the kinds of business problems to which machine learning has been applied.
2 – Start supervised learning with a wealth of historic data. Lange argues that most companies don’t need to collect months of data after implementing a machine learning system before they derive value. Instead, look at the historical information that you already have and feed it to a supervised machine learning system (an algorithm that takes a known set of inputs and a matching known set of outputs and trains a model to generate predictions for responses to new data). Companies often have reams of saved customer service data that can yield lots of valuable insights, like how lead sources correlate to refunds or how service packages are related to the amount of customer support a particular customer requires. The key is to choose existing data that is related to your main problem or question so that you drive ROI with purpose.
3 – Start with clean data, not big data: Don’t just find the biggest bucket of information; instead, find the information that you know is clean. Maybe you have lots of data around promotions and sales, but you tracked that data differently every month and yielded “messy” or “dirty” data (in other words, data that is not uniform). You have to make sure you’re comparing apples to apples; this is what’s meant, in so many words, by “clean data”. Previous TechEmergence guest Slater Victoroff of indico had similar insights on quality of data over quantity. Try finding a clean subset of information from that larger messy data set; for example, maybe the way you measure and track customer churn and lead source has been the same since day one. Your resulting dataset may not be as big, but if you can look at the data evenly across the board i.e. the format has stayed the same over time, then it’s considered clean and right for the job.
4 – Use an available cloud system (Amazon, Google, Microsoft, etc.): Some of the biggest names in the industry have started to introduce cloud-based machine learning (also known as open source software libraries), which are more or less “machine-learning kits” that allow companies and developers of varying skill levels to build their own systems and models. Amazon offers Amazon Machine Learning, Google has TensorFlow, Baidu offers The Stack, and there are several others out there at this point. Lange recommends doing some research, leveraging one of these pre-packaged systems, and skipping the from-scratch route. Check out the machine learning toolkits section in this recent article published in the Journal of Big Data for a sound set of evaluation criteria.