This is How You Scale Your Data Pipeline

January 13, 2022

Companies are practically drowning in their own data. As artificial intelligence and machine learning models enter production, they are continually generating more data than these companies can keep up with. How can companies hope to scale with this ever-expanding ocean of data they’re creating?

During the iMerit ML DataOps Summit, iMerit’s CRO Jeff Mills spoke with Raven Applied Technologies’ Senior Research Engineer Karthik Paga and SambaNova Systems’ VP of Software Products Prabhdeep Singh. The conversation spoke of how SambaNova’s vertically integrated tech stack is empowering companies of all sizes with the computing power they’ll need to scale their data pipelines, and how the right combination of humans in the loop and robotics process automation (RPA) can empower companies to scale their data pipelines.

Scale Your Data Pipeline

How Does Robotics Process Automation Help Companies Scale?

Robotic process automation takes the pain out of building, deploying, and managing software robots that emulate human actions when interacting with digital systems and software. If a company has thousands of robots running with machine learning models embedded in them, and hundreds of humans are inputting new data into these robots, problems naturally occur. Data needs to be correlated, the models need to be checked for proper deployment, and bias also needs to be identified. 

This entire process of data generation, entry, and processing is a stack. Naturally, the process has many steps, and each step in the process increases the chance of a problem occurring logistically. Robotics process automation is able to identify the bottlenecks across the entire stack, and through hyper automation, resolve these bottlenecks and improve the capacity of the process.

Robotics Process Automation

What Are Some Use Cases for RPA?

In the world of smart agriculture, autonomy centers around understanding the environment of operation. Agriculture is a unique challenge because the environment is constantly changing. Raven tackles this challenge by understanding the unique circumstances the machine will be deployed under, and do these circumstances relate to the machine learning pipeline it is built on. 

Once these circumstances are understood, Raven goes a step further by applying deep learning online. Raven’s actively deployed models are constantly learning and adapting to the environmental conditions and circumstances they operate under. 

Deep learning is what bridges the gap between the model in the field and the machine learning pipeline. Doing it in real-time enables Raven to deploy machine learning models that adapt into the constantly changing conditions of agriculture. The challenge is constantly overcoming the overwhelming amount of data that’s constantly being generated.

Devops is being handled like a well-oiled machine in enterprises. But the typical process isn’t the same in machine learning

-Prabhdeep Singh

For deep learning to work in real-time, robotic process automation is critical. As machine learning models scale, the tech stack that powers them must operate seamlessly. This requires a vertically-integrated stack of fully-optimized hardware, which is exactly what Sambanova is building, so that innovators like Raven can build autonomous solutions at scale


Human intelligence is an essential component of any successful tech stack. Processes involving human oversight allow refinement of the model. For example, when a model is created for field-extraction, a process which automates document uploading by extracting the values of the document’s fields, humans-in-the-loop come into play by observing the outputs of the model. When the model becomes challenged, it kicks tasks off to the human-in-the-loop, who can set aside the data to create a new training dataset that will then be used to retrain and improve the model.

“Humans in the loop are really important. Without them, ML models wouldn’t be what we need them to be. No model is 90-95% accurate right out of the gate.”

-Prabhdeep Singh

For autonomous models deployed in rapidly changing environments, the criteria for “good performance” is always changing. For any model deployed in real-time, a human-in-the-loop is critical to ensure the model is performing. For example, while working with a farmer, Raven had developed their system to operate in a specific field. The machine was harvesting a unique crop that it hadn’t been designed for.


In order for an autonomous model to actually deploy, the devops and vertically integrated tech stack must be present in the cab. This means that the farmer himself becomes the human-in-the-loop, and must understand the tech stack, be capable of identifying errors precisely, and be able to conduct the devops to personalize the model’s performance to his needs. Raven’s devops pipeline works in conjunction with the farmer to help identify what occurred in the event of the model’s inability to perform. 

This shared autonomy is somewhat unique to smart agriculture, as the people who buy these systems and deploy them in the field have unique expectations. Therefore, for the model to perform to the unique expectations of the customer, the customer themselves must be able to interact with the devops and tech stack to ensure the model performs. 

Vertical Stack + Humans-in-the-loop = Scaling Pipeline

The rate at which data is generated by deployed AI models is astronomical. The size of the teams managing them cannot hope to keep up. Through the use of vertically integrated tech stacks and robotics process automation, companies can employ the necessary level of computation needed to deploy their models, manage their devops, and make their models sing.

For teams that are struggling to keep up with their dataloops, iMerit is there to not only monitor, modify, and optimize the performance of their model, but also insert experts-in-the-loop wherever necessary. These experts will allow these companies to take their AI projects to the next level.