Role of Automation in ML DataOps

December 27, 2022

Integrating intelligent automation across machine learning data operations simplifies data preparation, optimizes data workflows and saves significant time and resources. In this session, Danny Lange, SVP of AI and Machine Learning at Unity reveals the role automation plays in machine learning data operations today, and in the foreseeable future.

Space and Time in Machine Learning

Danny applied the concept of space and time to talk about the machine learning. More specifically, he describes space as the amount of data that has to be processed, and time refers to how quickly it is processed. He observed that the trend is an increase in the amount of data, with a decrease in the time required to process it. We are now dealing with billions of datapoints which have to be processed in near real-time. This trend poses challenges such as an increase in processing and storage requirements. Machine Learning is no longer just an analytical process that affords to take hours and days to go through some data and extract information, the processing must be completed in minutes or even seconds.

“Automation is no good if it allows the process to derail because of bad data or changes that went unnoticed.”
– Danny Lange, SVP of AI and Machine Learning at Unity

Danny also notes that in the early days of the machine learning industry, the efforts and research were mainly focusing on the models and theory. Today it’s less about the details of the model, but more about scalability and deployment. These are the aspects which make a model applicable in real world scenarios, alongside data management and governance.

Automation in Machine Learning

As unity saw an increase over the past few years from 500-700 million users to 3-4 billion users – all of which are generating event streams – they had to define a process to capture all the play data, process it and send it back to the gaming studios for use cases such as in-game monetization, app monetization or changes to the games themselves.

From the other side of the fence, iMerit’s Raj AIkat highlights some of the changes he saw in the MLOps industry. These include:

  1. Human-in-the-loop becomes expert-in-the-loop – people don’t draw polygons and prep the data, they become experts who are orchestrating the labeling process using automation to ensure a constant stream of data annotation
  2. Integration of MLOps and DataOps – So far, there was only one touch point between these two cycles, at the ‘training’ point. Now data collection and annotation is impacting every single point of the ML Loop, such as testing, monitoring, deployment.

“In 2-3 years we won’t have a MLOps and DataOps different term for it. It will be completely fused together.”
– Raj Aikat, Chief Product and Marketing Officer

Human-in-the-loop and Automation

An age-old question that is raised every time a new wave of automation comes about is, what will happen to the humans in the process? As we have seen across the past century, automation does not mean that humans will be removed, but that they will have more interesting, higher-value and better paid jobs. This is happening just now in the MLOps industry.

With the changes brought by automation, organizations need to make upgrades to teams, and make team members really understand how their skills need to be scaled up and how they can grow.

“Automation doesn’t leave team members with less to do, but with more important things to do. We give team members the chance to upgrade from where they are today.”
– Danny Lange, SVP of AI and Machine Learning at Unity

Automation also affects the questions a business should ask when assessing labeling partners. Rather than looking at KPIs such as human cost per hour, questions in the automation age should include tooling, techniques, technologies, and expertise.

Danny observed that automation has had the highest impact in two areas of his business:

  1. Data diversity – in addition to Unity’s simple customer records such as purchases and gameplay, the gaming company also has other highly complex services where they need to apply machine learning, such as in-game chats so that it’s not abusive and unrelated to gameplay, or ads that are graphical and need to be analyzed for content moderation.
  2. Observability – operating in real-time with 10-20ms timeframes for decision making, Uity must constantly understand the quality of data. Data today is coming from customers, partners, vendors, and it’s more complex than just customers clicking a button on a website.

Anomalies and Edge Cases

One of the most productive automation cases for iMerit came from the autonomous mobility use cases, but spread to all other verticals. While it is easy to train with clean data in the lab, using a model in production exposes it to a lot of anomalies. Using automation, organizations can identify and define edge cases, assign attributes and identify trends and relationships.

Afterwards, creating permutations of these attributes can create extended or novel test case scenarios of the edge cases. For example, if you have an edge case with a trailer on a highway that swerves from left to right as it loses control, extending these edge cases can simulate the same scenario in other driving conditions, such as snow, sleet, or rain. With these other conditions organizations can train their model to see how to respond differently depending on the conditions.

Have you explored our cutting-edge Radiology Annotation Suite? Click here to discover its powerful capabilities.

If you’d like to learn more about our services, contact us today to talk to an expert.