CIOAdvisor Apac

Managing Machine Learning in the Enterprise

Øyvind Roti, Head of Solutions Architecture APAC, Google Cloud

Øyvind Roti, Head of Solutions Architecture APAC, Google Cloud

Machine learning is used extensively by companies across the industry spectrum. Examples include quality control on factory floors, trading systems in financial services, and recommendation engines in media and retail. However, many enterprises are just getting started and are looking for the best way to manage their machine learning initiatives. Large-scale machine learning requires substantial computing power and data, so managing costs through efficient use of resources is also an important consideration.

Given the wide range of business problems it addresses, machine learning will become a ubiquitous part of software applications throughout enterprises, and managing the machine learning lifecycle will become a critical skill for software engineers and IT practitioners of the future. This will require some new skills, but fortunately fits neatly into existing good practices like agile development. In short, the solution is to build an end-to-end platform for machine learning pipelines.

Introduction to Machine Learning

Machine learning is a subfield of artificial intelligence to build models that learn from examples without hard-coding rules or behaviors. It is usually broken into three types: supervised, unsupervised, and reinforcement learning. We will focus on supervised learning, but the platform concept extends to all three. Supervised machine learning works by training a model with labeled examples. For instance, to build a system that can recognize objects in an image, the training data could be photos with a label (e.g. a photo labeled “cat”). After a while, the model will be able to tell which pixels and patterns make up a “cat”. By repeatedly training and tweaking, the model will gradually improve until it is able to take an unlabeled example and predict the correct output.

Some models can reach this level of performance with hundreds of training examples and a single machine, but a particularly powerful form of supervised learning, Deep Learning, is especially hungry for data and compute resources.

Managing the machine learning lifecycle will become a critical skill for software engineers and IT practitioners of the future

For Deep Learning models, you may need a cluster of servers and tens of thousands training examples to achieve good results. While they require more resources, Deep Learning models have advanced the cutting edge of fields like computer vision, speech recognition, natural language processing, and recommender systems, and are now widely used by companies in numerous industries as well as in popular consumer mobile apps.

Building a Platform for End-to-end Machine Learning Pipelines

At a high level, the end-to-end machine learning pipeline can be broken into three stages: data transformation, model training, and model serving. A common mistake is to focus almost exclusively on the model training stage. However, much of the work going into an effective machine learning pipeline is the cleaning and transformation of input data. The key to data transformation is automation to embed good practices into future data sources and reduce the chance of errors prevalent in ad-hoc, manual processes. Automating the data pipeline requires sufficient flexibility to add new data sources, whilst blocking poor quality data points. This data engineering step is not trivial and will likely require the most effort and political capital to initially set up, as data silos are broken down and processing is standardized. This initial investment will pay off as more and more real-time and batch data sources are added.

Once the input data is available, you start training the machine learning models. It is usual for this step to be led by experts like data scientists. However, if you abstract the underlying complexity and encapsulate best practices, people without machine learning expertise can train models as well. It is a good idea to conduct many experiments in parallel and expect some of these to work well and some not to work at all. This requires automation and scalable infrastructure resources. Note that when measuring the real-life performance of the model, it should be tied directly to business results, e.g. change in customer conversion rates, not just technical metrics like prediction accuracy. After all, the model could be incredibly accurate but bring no business benefit whatsoever.

Serving of machine learning models requires production levels of availability and scalability. Models will often be deployed to the client or bundled with mobile apps. By creating several candidate models, the platform can be built to allow A/B testing where user subsets are exposed to different model versions. Just like with Continuous Integration and Delivery, automated testing must ensure that every model that gets deployed has reached the acceptable level of generalization accuracy before being exposed to the end users. Once deployed, end-user interactions can be used to further train and improve the model and so the life cycle continues.

Putting it All Together

Many enterprises focus on the data science part of machine learning but neglect the criticality of building a platform for end-to-end, automated machine learning pipelines. It is important to realize that machine learning will be a standard part of every software engineer and IT professional's toolkit and needs to be integrated into existing agile practices and Continuous Integration and Delivery processes. Rather than re-inventing the wheel, aim to integrate them into your enterprise IT delivery capability and governance.

Enterprise Digital Transformation is not for the faint hearted: Guiding principles for a enterprise-wide digital transformation

Linda Zeelie, Enterprise Digital Transformation Architect and Leader, Metlife and Nina Evans (Professorial lead: UniSA STEM, University of South Australia (UniSA))

CIOAdvisor Apac

Shawn Paskevic, CIO, NEBCO

Esteban Remecz, CIO Asia Pacific, ZF Group

Clark Golestani, EVP and CIO, Merck

John Kochavatr, CIO & Digital Leader, GE Water & Process Technologies

Colin Boyd, VP & CIO, Joy Global [NYSE:JOY]

Michael Coatsworth, Principal Architect, Foodstuffs North Island

Krishna Dermawan, Head of Data Analyst, Tokopedia

Mike Congdon, Head of Business Insights, OfficeMax New Zealand

Managing Machine Learning in the Enterprise

Natural Language Processing

Your Application is Mostly Written by Strangers

ESG Performance - Why It's Crucial To Future Success

Olympic sports training applied to media agency clients?!?

Case Study: Media consultancy leverages data to self fund client marketing campaigns

Vertical value chain integration applied to media agency clients?!?

Enterprise Digital Transformation is not for the faint hearted: Guiding principles for a enterprise-wide digital transformation