Data on the Edge with Datasaur: Key Takeaways

October 26, 2021

In the inaugural episode of Data on the Edge, Jeff Mills, Chief Revenue Officer of iMerit, sits down with Ivan Lee, CEO & Founder of Datasaur, to explore data challenges in the ML operations landscape and how Datasaur, a leading NLP labeling and annotation tool, helps to tackle edge case situations in NLP.

Datasaur has partnered with iMerit to combine the best labeling software with the best labeling workforce, to annotate NLP training data at scale for clients across multiple sectors.

Here are 5 key takeaways from the episode:

  • Natural language processing workflows have evolved from simple language parsing with parts of speech such as named entity recognition to more advanced workflows like aspect-based sentiment analysis. Datasaur enables iMerit to draw relationships between one sentiment and the corresponding attribute to get a more nuanced view of what a statement actually means.

  • The three product pillars of Datasaur includes – Powerful and simplistic interface for labeling, build intelligence to automate simple tasks, collaboration and workflow management through dashboards and reporting.

  • Of all the branches of AI, NLP is one of the most mature and directly applicable. As of now, Datasaur primarily supports NLP text-based use cases. They intend to move forward into text-adjacent domains such as audio and optical character recognition (OCR), in which audio and images will be transcribed into text.

  • Both the data-centric and model-centric approaches are significant, however improving the quality and quantity of data is actually the fastest way forward to improve the ML model performance.

  • Customers and users are becoming more sophisticated in their project analysis and breakdown. Aside from recognizing how efficiently the project is going, there is an increasing desire for being able to extract insights and feedback from labeling tasks.