Post

Convergence of ML Ops and Data Pipelines

December 13, 2022

In this session, Abhijit Bose, Head of the Center for Machine Learning at Capital One, Alok Gupta, Head of Data at DoorDash, and Vidyaranya Devigere, Head of Algorithms at Overstock, share how they are transforming their data pipelines across collection, preparation, management, and development to better manage their data, improve ML throughput and create meaningful AI applications for their business.

To Build or to Buy?

When starting a new ML project, enterprises need to decide whether they will opt for an off-the-shelf ML platform, or build one in house. The panelists give their opinions on the advantages and disadvantages for each of those options, but also highlight some nuances and hybrid approaches.

Opting to buy a ML system is a great way to kickstart the project and build momentum. It reduces the development times and initial research costs. However, when buying an off-the-shelf solution, the product may not have the full range of capabilities required by your organization. If you have specific requirements which are not met by products currently available on the market, building an in-house solution may be the better way to start the project. For example, if the ML system needs to be compliant with some industry regulations such as HIPAA.

Many off-the-shelf products can also support a good level of customization. As such, organizations can purchase ML solutions to test how they meet the requirements and work with the vendor to create project-specific customizations. When purchasing an out-of-the-box ML platform, it is also worth considering the vendor’s roadmap and availability of deployment options. Another approach is to start developing a ML platform in-house to meet the core requirements, and afterwards integrate and replace parts of the platform with off-the-shelf products.

“The right solution must win, regardless of build or buy”
– Abhijit Bose, Head of the Center for Machine Learning at Capital One

Leveraging End-to-End MLOps

When the panelists were asked whether their organizations are employing end-to-end ML Ops platforms, or if they support their projects using a collection of tools, all three organizations said they have a central MLOps platform. Each has a slightly different strategy depending on their use cases. CapitalOne, for example, offers developers the option to extend the capabilities of the MLOps platform with modules built by other business units in the organization, creating a modular platform that supports their highly regulated industry use cases.

“MLOps is not about building a model, putting it in the world and watching the good times roll. It’s an engineering system, and just like elsewhere in engineering, you have to work hard on scalability, reliability, velocity, and it has to be treated as a first-class engineering system with handshakes, microservices, checkpoints and unit tests. That depends on education and training.”
– Alok Gupta, Head of Data at DoorDash

DoorDash opted for a centralized platform for 90% of their use cases. They view this as a strategic advantage as training models in a centralized manner enables them to apply them in multiple places in their application. For example, personalization forecasting can be reused in multiple places of the DoorDash app without requiring rewriting. Overstock’s approach to a centralized MLOps platform enables their organization to establish a consistent workflow and a baseline of best practices and standardized technologies.

ML packages can be a double-edged sword when implemented without a well-defined strategy. Alok GUpta, Head of Data and DoorDash, shared that their strategy when starting their ML journey was to start with as few ML packages as possible. To do that, they defined a scope and a list of metrics on which they evaluated ML packaged. They knew that their requirements can be met using deep learning and tree-based methods. With this in mind, they’ve done a bake-off between a range of packages and evaluated them on speed, level of customization, and performance. They’ve selected a total of two packages, one for deep learning, and another for tree-based methods.

Common and Project-Specific MLOps Challenges

The panelists shared that every machine learning project will face challenges, some of them being common across industries and projects, while others are specific to the technology that’s being developed. Common ML project challenges can be, for example, ensuring compliance with industry regulations and setting up adequate governance policies for the project. Another common challenge is the data workflow, such as data collection and annotation. Oftentimes, data quality lands to be the sole responsibility of the engineering team, when it should be the responsibility of the whole organization, to ensure quality assurance in the end-to-end process.

Other challenges are project-specific. For example, DoorDash found that it has had challenges to get the online features and offline features to agree. This is because a model trained with cleaned and sanitized data, which is much higher quality than real-world and real-time data. As such, the company had to re-code features of their prediction algorithm to produce adequate results when exposed to real-world, messy data.

On the flipside, Alok Gupta also highlighted how easily predictable demand forecasting can be. In their use case for food delivery, factors such as events, sports events, concerts, time of day, and location offer a very high level of insights that enable them to accurately predict demand.

“The power comes from the data, and the more predictive data you can get, the better features you can achieve”
– Alok Gupta, Head of Data at DoorDash

Monitoring and Visibility for MLOps

All three panelists said that monitoring and visibility is one of the most important tips for data scientists and engineers when developing a ML model. Monitoring constantly evaluates the model’s performance and produces reports to highlight issues or anomalies. Changes in performance should be monitored at various time intervals, as some changes may be very obvious step changes in day-to-day timeframes, while others can only be observed when looking over months.

The other side to monitoring is visibility, and more specifically, visibility of the end-to-end data workflow. Understanding the data sources and the ETL pipelines can highlight the point of failure in the data workflow in case of quality issues.

Have you explored our cutting-edge Radiology Annotation Suite? Click here to discover its powerful capabilities.

If you’d like to learn more about our services, contact us today to talk to an expert.