Managing Machine Learning in the Enterprise
By Øyvind Roti, Head of Solutions Architecture APAC, Google Cloud
Given the wide range of business problems it addresses, machine learning will become a ubiquitous part of software applications throughout enterprises, and managing the machine learning lifecycle will become a critical skill for software engineers and IT practitioners of the future. This will require some new skills, but fortunately fits neatly into existing good practices like agile development. In short, the solution is to build an end-to-end platform for machine learning pipelines.
Introduction to Machine Learning
Machine learning is a subfield of artificial intelligence to build models that learn from examples without hard-coding rules or behaviors. It is usually broken into three types: supervised, unsupervised, and reinforcement learning. We will focus on supervised learning, but the platform concept extends to all three. Supervised machine learning works by training a model with labeled examples. For instance, to build a system that can recognize objects in an image, the training data could be photos with a label (e.g. a photo labeled “cat”). After a while, the model will be able to tell which pixels and patterns make up a “cat”. By repeatedly training and tweaking, the model will gradually improve until it is able to take an unlabeled example and predict the correct output.
Some models can reach this level of performance with hundreds of training examples and a single machine, but a particularly powerful form of supervised learning, Deep Learning, is especially hungry for data and compute resources.
Managing the machine learning lifecycle will become a critical skill for software engineers and IT practitioners of the future
Building a Platform for End-to-end Machine Learning Pipelines
At a high level, the end-to-end machine learning pipeline can be broken into three stages: data transformation, model training, and model serving. A common mistake is to focus almost exclusively on the model training stage. However, much of the work going into an effective machine learning pipeline is the cleaning and transformation of input data. The key to data transformation is automation to embed good practices into future data sources and reduce the chance of errors prevalent in ad-hoc, manual processes. Automating the data pipeline requires sufficient flexibility to add new data sources, whilst blocking poor quality data points. This data engineering step is not trivial and will likely require the most effort and political capital to initially set up, as data silos are broken down and processing is standardized. This initial investment will pay off as more and more real-time and batch data sources are added.
Once the input data is available, you start training the machine learning models. It is usual for this step to be led by experts like data scientists. However, if you abstract the underlying complexity and encapsulate best practices, people without machine learning expertise can train models as well. It is a good idea to conduct many experiments in parallel and expect some of these to work well and some not to work at all. This requires automation and scalable infrastructure resources. Note that when measuring the real-life performance of the model, it should be tied directly to business results, e.g. change in customer conversion rates, not just technical metrics like prediction accuracy. After all, the model could be incredibly accurate but bring no business benefit whatsoever.
Serving of machine learning models requires production levels of availability and scalability. Models will often be deployed to the client or bundled with mobile apps. By creating several candidate models, the platform can be built to allow A/B testing where user subsets are exposed to different model versions. Just like with Continuous Integration and Delivery, automated testing must ensure that every model that gets deployed has reached the acceptable level of generalization accuracy before being exposed to the end users. Once deployed, end-user interactions can be used to further train and improve the model and so the life cycle continues.
Putting it All Together
Many enterprises focus on the data science part of machine learning but neglect the criticality of building a platform for end-to-end, automated machine learning pipelines. It is important to realize that machine learning will be a standard part of every software engineer and IT professional's toolkit and needs to be integrated into existing agile practices and Continuous Integration and Delivery processes. Rather than re-inventing the wheel, aim to integrate them into your enterprise IT delivery capability and governance.