Tackling moderation for user-generated content is tricky, especially on gaming sites or social platforms for gaming communities. Automated content moderation tools can detect toxicity to an extent but miss out on behaviors and are incapable of analyzing toxic intent. An effective AI-based in-game content moderation solution should be able to consider context, not just the content or words used.
Alliance For A Safe Gaming Experience
iMerit worked with one of the biggest game publishers in the US to tackle in-game toxicity using AI and human-in-the-loop. Given their most popular game draws 150+ million monthly players alone, its existing solutions were insufficient for the player safety sought. By leveraging our domain expertise in gaming community management and custom language model development, together with best-in-class tooling provided by Dataloop, we supported the company by helping them curate high-quality datasets with in-game nuances and sensitivity.
The game publisher collected in-game voice communication for its popular games to evaluate and moderate toxic behavior in its popular first-person shooter games. iMerit and Dataloop partnered to create a nuanced ground-truth training dataset for the in-house speech moderation model of the video publisher.
Scaling Up Success with Quality At The Centre
We transcribed and labeled ~30m single-speaker audio files consisting of unsegmented player chats for the company as per the annotation best practices and project guidelines. Our human-in-the-loop workflow leveraged automated voice activity detection, utterance segmentation, and speech-to-text models, refined by a white-glove transcription and classification process, to exceed an accuracy threshold of 95% at scale.
With our in-house language experts and a two-step quality control process, we could resolve ASR (Automatic Speech Recognition) model errors caused by gaming language nuances and provide invaluable feedback to fine-tune them. For improved data reliability, each utterance/speech segment contained a confidence score, which helped the moderation team understand where they needed to spend more time.
To achieve a high level of quality, the moderation team was hand-picked and underwent rigorous training with our Solution Architects and Domain Experts. Over the course of the project, iMerit curated a specialized lexicon and language model to identify in-domain language usage patterns that differ from everyday language and pose a particular challenge for both ASR (Automatic Speech Recognition) models and moderation solutions. In order to ensure high-quality data, the annotation team had constant support from different stakeholders.
Dataloop for Accelerating Data Annotation
An optimized data annotation tool is one of the crucial aspects of the success of a project like this one, and we worked with Dataloop for the complex needs of the gaming publisher. Dataloop enabled us to weave human and machine intelligence for accelerating dataset curation with high accuracy and faster turnaround. With Dataloop’s audio studio, our team was able to;
- Generate a large number of annotated recordings with precise guidelines while reducing labeling time
- Deliver human-in-the-loop model validation workflows
- Automate pre and post-data processing for scalability
- Achieve real-time visibility of the annotation process
Dataloop also supported QA and feedback for the team working on this annotation workflow, with real-time communication between data annotators, managers, and other team members.
To accelerate quality training dataset curation for gaming content moderation, the video game company partnered with iMerit. We played a crucial role in helping them improve their safe gaming experience. The company is now confident about its content moderation model and hopes to see an increase in the number of active users as gaming communities become inclusive and welcoming spaces.