Exploring the AI and ML DataOps Market

April 08, 2022

At the iMerit MLDataOps Summit 2021, iMerit CRO Jeff Mills moderated a panel featuring Gartner’s Senior Director Analyst Sumit Agarwal and Bessemer Venture Partners’ Ethan Kurzweil. The conversation focused on several topics including:

  • How data value chains are changing, and how they’ll impact the data scientists working in them
  • Why AI projects should be simple, even if it makes them boring
  • Indicators of success venture capitalists look for in AI ventures
ML DataOps ecosystem

Both Ethan and Sumit also shared a framework for executives and engineers to gauge the market fit of their ideas within the ML DataOps ecosystem.

Data Value Chains

“The old data stacks that companies use are unbundling, decoupling and fragmenting into best-of-breed solutions along each part of the data value chain.”

– Ethan Kurzweil, Bessemer Venture Partners

Kicking off an AI project comes down to data gathering and storage, and then moves into organizing the data for use in training, model training, error correction, monitoring the model and production deployment.

Each of those segments is a category in which startups can disrupt the space to tackle inefficient or poor model training. Startups can increase value along the supply chain by building tools that make the data pipeline run more smoothly. An area with potential growth is the tooling ecosystem that can empower data scientists to simplify their jobs, accelerate their results and enable them to contribute more impactful ways to the data value chain. 

Data scientists are empowered now more than ever to bring their own software to their job function. If this is done correctly, it can remove much of the complexity from the process of data science at every step of the data value chain.

Indicators of Success

The vast DataOps market allows many new companies to enter the AI space. To do so successfully, Bessemer Venture Partners have identified these as indicators of success within companies:

  1. Ecosystem-driven: Any new data infrastructure tool should work seamlessly with other parts of the chain. This requires companies to be aware of the current vendor ecosystem and the space they are entering. Partnerships with other players can be a huge advantage for interoperability, product launches and market adoption. As the data ops space is populated by such diverse and niche software, interoperability is necessary to capture that part of the market.
  2. Community engagement: Since data science and most of the underpinnings of AI/ML began in academia, new data infrastructure tools should leverage the sense of collaboration and gather community support. This means working closely with existing open-source communities such as forums, discussion boards and mailing lists. Community engagement can also mean engaging with communities to help with product outreach, an alternative to traditional direct marketing.
  3. Remove friction from the stack: Companies are eager to outsource non-core tasks such as orchestration tools or compute infrastructure management. A product that removes friction and prioritizes the developer or data scientist can easily find traction in the market.
  4. Easy collaboration: As companies grow, their teams get siloed, being separated by different tooling and metrics of success. Tools should bridge gaps between different business functions, such as coordinating teams to focus on their data operation instead of constantly transforming it to work with other teams.

Make AI Boring

“There’s so much invocation happening in both the vendor and the open-source space. It’s been really fun to interact with all the practitioners, startups, and industry leaders. While I like the space and find it exciting, I’d like  to make it as boring as possible. Where you can do this in as many steps and as many times.”

– Sumit Agarwal, Gartner
Hype cycle for artificial intelligence,2021

AI has been steadily climbing the Hype Cycle as much of the discussion in areas such as virtual reality and autonomous vehicles glamorizes the field and its developments. Great AI products need to solve the data woes of enterprises reliably and consistently. For financial institutions or healthcare practitioners, the most valuable AI product is one they can depend on. 

Trust comes from repeatedly performing tasks on data with consistent results. The infrastructure should, in essence, be simple, forgettable, and even a little boring, in order to make way for real innovation in areas that the end-users need. 

Rather than flashy marketing speak and other promises, AI projects should answer plain questions such as :

  • Is it solving a real-world business problem?
  • Is it scalable? 
  • Will the results be consistent?
  • Is there good quality data available? 
  • Is there a data architecture in place to embed all these best practices?
  • Will the AI be used ethically and responsibly?
  • Can the workings of the product be explained, or is it a complete black box?
  • Is the data protected as per the security and privacy regulations of today?

Rounding Out the Data Cycles

With DataOps reaching its peak in 2022, it’s helpful to take a step back and see what, if any, are the missing pieces in the data lifecycle.

“Because there is a move to store more and more data and be able to dump it into various places, oftentimes you’re not storing the right thing. Or you are storing the right thing, but you’re storing so much other stuff that it’s very hard to parse out what’s actually useful. “

– Ethan Kurzweil, Bessemer

The AI and DataOps space can mature faster by learning and borrowing from the more mature field practices in software development. For decades, software development companies and open source contributors worked to bring scalability and repeatability to development and release pipelines. The data pipelines of today seem to mirror those concepts. With multiple types of AI/ML teams involved, the pipelines need greater standardization to be deployed quickly. 

Of particular interest is the variability in models. In some companies with unchanging data, the model doesn’t need as much attention. Whereas, for some use scenarios (like in e-commerce), training and deploying new models is a daily activity.

The data infrastructure needs to accommodate this whole spectrum of requirements.

The AI Industry is Maturing

I think we’re still in the beginning. A few years back, when building our first models, we were just experimenting, not knowing if there was any value. Fast forward to now, we’re talking to a spectrum of organizations with some on their 10th model and some thinking in terms of thousands.”

– Sumit Agarwal, Gartner

The maturity of the AI/ML Data tooling market has grown in correspondence with the AI market itself. As newer data sources and their problems are discovered, tools that best fit the job become the demands of the day.  

While we have reasonably robust tools for ingesting and transforming data, the future will need more dynamic and adaptive DataOps. To realize the vision of a genuinely data-aware company, the infrastructure to make real-time data decisions needs to be built out. And since these requirements are new and in flux with the AI landscape, startups have the chance to shine by pivoting to solve modern tooling challenges in modern ways.

If you wish to learn more about iMerit’s data annotation services, please contact us to talk to an expert.